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PREFACE 


Geodesy begins with measurements from control points. Geometric geodesy measures 
heights and angles and distances on the Earth. For the Global Positioning System (GPS), 
the control points are satellites and the accuracy is phenomenal. But even when the mea- 
surements are reliable, we make more than absolutely necessary. Mathematically, the po- 
sitioning problem is overdetermined. 

There are errors in the measurements. The data are nearly consistent, but not exactly. 
An algorithm must be chosen—very often it is least squares—to select the output that 
best satisfies all the inconsistent and overdetermined and redundant (but still accurate!) 
measurements. This book is about algorithms for geodesy and global positioning. 

The starting point is least squares. The equations Ax = b are overdetermined. No 
vector x gives agreement with all measurements b, so a “best solution” £ must be found. 
This fundamental linear problem can be understood in different ways, and all of these ways 
are important: 


1 (Calculus) Choose £ to minimize ||b — Ax||?. 
2 (Geometry) Project b onto the “column space” containing all vectors Ax. 
3 (Linear algebra) Solve the normal equations ATA£ = ATb. 


Chapter 4 develops these ideas. We emphasize especially how least squares is a 
projection: The residual error r = b — A£ is orthogonal to the columns of A. That means 
A'r = 0, which is the same as A! AX = ATb. This is basic linear algebra, and we follow 
the exposition in the book by Gilbert Strang (1993). We hope that each reader will find 
new insights into this fundamental problem. 

Another source of information affects the best answer. The measurement errors have 
probability distributions. When data are more reliable (with smaller variance), they should 
be weighted more heavily. By using statistical information on means and variances, the 
output is improved. Furthermore the statistics may change with time—we get new infor- 
mation as measurements come in. 

The classical unweighted problem ATA = ATb becomes more dynamic and real- 
istic (and more subtle) in several steps: 


- Weighted least squares (using the covariance matrix È to assign weights) 
— Recursive least squares (for fast updating without recomputing) 
- Dynamic least squares (using sequential filters as the state of the system changes). 


The Kalman filter updates not only X itself (the estimated state vector) but also its variance. 
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Chapter 17 develops the theory of filtering in detail, with examples of positioning 
problems for a GPS receiver. We describe the Kalman filter and also its variant the Bayes 
filter—which computes the updates in a different (and sometimes faster) order. The formu- 
las for filtering are here based directly on matrix algebra, not on the theory of conditional 
probability—because more people understand matrices! 

Throughout geodesy and global positioning are two other complications that cannot 
be ignored. This subject requires 


e e Z 2 y . 
- Nonlinear least squares (distance y x4 + y4 and angle arctan ~ are not linear) 
- Integer least squares (to count wavelengths from satellite to receiver). 


Nonlinearity is handled incrementally by small linearized steps. Chapter 10 shows how 
to compute and use the gradient vector, containing the derivatives of measurements with 
respect to coordinates. This gives the (small) change in position estimates due to a (small) 
adjustment in the measurements. 

Integer least squares resolves the “ambiguity” in counting wavelengths—because the 
receiver sees only the fractional part. This could be quite a difficult problem. A straightfor- 
ward approach usually succeeds, and we describe (with MATLAB software) the LAMBDA 
method that preconditions and decorrelates harder problems. 

Inevitably we must deal with numerical error, in the solution procedures as well as 
the data. The condition number of the least squares problem may be large—the normal 
equations may be nearly singular. Many applications are actually rank deficient, and we 
require extra constraints to produce a unique solution. The key tool from matrix analysis is 
the Singular Value Decomposition (SVD), which is described in Chapter 7. It is a choice 
of orthogonal bases in which the matrix becomes diagonal. It applies to all rectangular 
matrices A, by using the (orthogonal) eigenvectors of ATA and AAT. 


The authors hope very much that this book will be useful to its readers. We all have a 
natural desire to know where we are! Positioning is absolutely important (and relatively 
simple). GPS receivers are not expensive. You could control a fleet of trucks, or set out 
new lots, or preserve your own safety in the wild, by quick and accurate knowledge of 
position. From The Times of 11 July 1996, GPS enables aircraft to shave up to an hour 
off the time from Chicago to Hong Kong. This is one of the world’s longest non-stop 
scheduled flights—now a little shorter. 

The GPS technology is moving the old science of geodesy into new and completely 
unexpected applications. This is a fantastic time for everyone who deals with measure- 
ments of the Earth. We think Gauss would be pleased. 

We hope that the friends who helped us will be pleased too. Our debt is gladly 
acknowledged, and it is a special pleasure to thank Clyde C. Goad. Discussions with him 
have opened up new aspects of geodesy and important techniques for GPS. He sees ideas 
and good algorithms, not just formulas. We must emphasize that algorithms and software 
are an integral part of this book. 

Our algorithms are generally expressed in MATLAB. The reader can obtain all the 
M-files from http://www.i4.auc.dk/borre/matlab. Those M-files execute the techniques 


Preface xi 


(and examples) that the book describes. The first list of available M-files is printed at the 
end of the book, and is continuously updated in our web homepages. Computation is now 
an essential part of this subject. 

This book separates naturally into three parts. The first is basic linear algebra. The 
second is the application to the (linear and also nonlinear) science of measurement. The 
third is the excitement of GPS. You will see how the theory is immediately needed and 
used. Measurements are all around us, today and tomorrow. The goal is to extract the 
maximum information from those measurements. 


Gilbert Strang Kai Borre 
MIT Aalborg University 
gs@math.mit.edu borre@i4.auc.dk 


http://www-math.mit.edu/~gs http://www.i4.auc.dk/borre 
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This popular article is reprinted from SIAM News of June 1997. 
Then Chapter 14 begins an in-depth description of the whole GPS system and its applications. 


THE MATHEMATICS OF GPS 


It is now possible to find out where you are. You can even discover which way you are 
moving and how fast. These measurements of position and velocity come from GPS (the 
Global Positioning System). A handheld GPS receiver gives your position on the Earth 
within 80 meters, and usually better, by measuring the range to four or more satellites. For 
centimeter accuracy we need two receivers and more careful mathematics; it is a problem 
in weighted least squares. Tracking the movement of tectonic plates requires much greater 
precision than locating trucks or ships or hikers, but GPS can be used for them all. 

This article describes some of the mathematics behind GPS. The overall least squares 
problem has interesting complications. A GPS receiver has several satellites in view, 
maybe six for good accuracy. The satellites broadcast their positions. By using the time 
delays of the signals to calculate its distance to each satellite, the receiver knows where 
it is. In principle three distance measurements should be enough. They specify spheres 
around three satellites, and the receiver lies at their point of intersection. (Three spheres 
intersect at another point, but it’s not on the Earth.) 

In reality we need a minimum of four satellites. The four measurements determine 
not only position but clock time. Jn GPS, time really is the fourth dimension! The problem 
is that the receiver clock is not perfectly in sync with the satellite clock. This would cause 
a major error—the uncorrected reading is called a pseudorange. The receiver clock is not 
top quality, and the military intentionally dithers the satellite clock and coordinates. (There 
was a sensational rescue of a downed pilot in Serbia, because he carried a GPS receiver 
and could say exactly where he was. The dithering is to prevent others from using or 
sabotaging the system.) The President has proposed that this selective availability should 
end within ten years, and meanwhile GPS scientists have a way around it. 


Clock Errors 


The key is to work with differences. If we have two receivers, one in a known and fixed 
position, the errors in the satellite clock (and the extra delay as signals come through the 
ionosphere, which is serious) will virtually cancel. This is Differential GPS. Similarly an 
extra satellite compensates for error in the receiver clock. Double differencing can give 
centimeter and even millimeter accuracy, when the other sources of error are removed and 
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measurements are properly averaged (often by a Kalman filter). Southern California will 
soon have 250 continuous GPS receivers, watching for movements of the Earth’s crust. 

The carrier signal from the satellite is modulated by a known code (either a coarse 
C/A code of zeros and ones, or a precise P code). The receiver computes the travel time 
by synchronizing with that sequence of bits. When the receiver code and satellite code are 
correlated, the travel time is known—except for the discrepancy between the clocks. 

That clock error At, multiplied by the speed of light, produces an unknown error 
cAt in the measurement distance—the same error for all satellites. Suppose a handheld 
receiver locks onto four satellites. (You could buy one that locks onto only three, but 
don’t do it.) The receiver solves a nonlinear problem in geometry. What it knows is the 
difference d;; between its distance to satellite i and to satellite j. Ina plane, when we know 
the difference dı2 between distances to two points, the receiver is located on a hyperbola. 
In space this becomes a hyperboloid (a hyperbola of revolution). Then the receiver lies at 
the intersection of three hyperboloids, determined by dı2 and d13 and dq. 

Two hyperboloids are likely to intersect in a simple closed curve. The third probably 
cuts that curve at two points. But again, one point is near the Earth and the other is far 
away. The error in position might be 50 meters, which is good for a hiker or sailor but 
not close enough for a pilot who needs to land a plane. For $250 you could recognize 
the earthquake but not the tiny movements that foreshadow it. The serious mathematics of 
GPS is in using the phase of the signal and postprocessing the data, to reduce the errors 
from meters to millimeters. 


Clocks Versus Lunar Angles 


It is fascinating to meet this world of extremely accurate measurers. In a past century, they 
would have been astronomers or perhaps clockmakers. A little book called Longitude, by 
Dava Sobel, describes one of the first great scientific competitions—to provide ship cap- 
tains with their position at sea. This was after the loss of two thousand men in 1707, when 
British warships ran aground entering the English Channel. The competition was between 
the Astronomers Royal, using distance to the moon and its angle with the stars, and a man 
named John Harrison, who made four amazing clocks. It was GPS all over again! Or more 
chronologically, GPS solved the same positioning problem later—by transmitting its own 
codes from its own satellites and using cesium and rubidium clocks. 

The accuracy demanded in the 18th century was a modest i in longitude. The Earth 
rotates that much in two minutes. For a six-week voyage this allows a clock error of three 
seconds per day. Newton recommended the moon, and a German named Mayer won 3000 
English pounds for his lunar tables. Even Euler got 300 for providing the right equations. 
(There is just more than I can say here.) But lunar angles had to be measured, on a rolling 
ship at sea, within 1.5 minutes of arc. 

The big prize was practically in sight, when Harrison came from nowhere and built 
clocks that could do better. You can see them at Greenwich, outside London. H-1 and 
H-2 and H-3 are big sea clocks and still running. Harrison’s greatest Watch, the H-4 that 
weighs three pounds and could run if the curators allowed it, is resting. It lost only five 
seconds on a long trip to Jamaica, and eventually it won the prize. 
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The modern version of this same competition was between VLBI and GPS. Very 
Long Baseline Interferometry uses “God’s satellites”, the distant quasars. The clock at the 
receiver has to be very accurate and expensive. The equipment can be moved on a flatbed 
but it is certainly not handheld. There are valuable applications of VLBI, but it is GPS 
that will appear everywhere. You can find it in a rental car, and now it is optional in your 
Mercedes or BMW. The display tells you where you are, and the way to the hotel. The 
reader will understand, without any attempt at reviewing all the possible applications of 
GPS, that this system is going to affect our lives. GPS is perhaps the second most important 
military contribution to civilian science, after the Internet. But the key is the atomic clock 
in the satellite, designed by university physicists to confirm Einstein’s prediction of the 
gravitational red shift. 


Integer Least Squares 


I would like to return to the basic measurement of distance, because an interesting and 
difficult mathematical problem is involved. The key is to count radio wavelengths between 
satellites and receiver. This number (the phase) is an integer plus a fraction. The integer 
part (called the ambiguity) is the tricky problem. It has to be right, because one missing 
wavelength means an error of 19cm or 24 cm (the satellite transmits on two frequencies). 

Once the integer is known we try to hold on to it. A loss-of-lock from buildings or 
trees may create cycle slips. Then the phase must be initialized again. Determining this 
integer is like swimming laps in a pool—after an hour, the fractional part is obvious, but 
it is easy to forget the number of laps completed. You could estimate it by dividing total 
Swim time by approximate lap time. For a short swim, the integer is probably reliable. But 
the longer you swim, the greater the variance in the ratio. In GPS, the longer the baseline 
between receivers, the harder it is to find this whole number. 

In reality, we have a network of receivers, each counting wavelengths to multiple 
satellites (on two frequencies). There might be 100 integer ambiguities to determine si- 
multaneously. It is a problem in integer least squares. This is identical to the nearest 
lattice vector problem in computational combinatorics: 


Minimize (x — xo) A(x — xo) for x in Z”. 


The minimum over R” is clearly zero, at x = xo. We are looking for the lattice point x (the 
ambiguity vector) that is closest to xo in the metric of A. This minimization over Z” can 
be such a difficult problem (for large random matrices A) that its solution has been used 
by cryptographers to encode messages. In GPS the weighting matrix A involves distances 
between receivers, and the problem is hardest for a global network. 

The minimization is easy when A is diagonal, because the variables are uncoupled. 
Each component of x will be the nearest integer to the corresponding component of xg. But 
an ill-conditioned A severely stretches the lattice. A direct search for the best x becomes 
horrible. The natural idea is to precondition A by diagonalizing as nearly as possible— 
always keeping the change of basis matrices Z and ZT! integral. Then y!(Z!AZ)y is 
more nearly uncoupled than x! Ax, and y = Z~'!x is integral exactly when x is. 
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If you think of the usual row operations, it is no longer possible to get exact zeros 
below the pivots. But subtracting integer multiples will make each off-diagonal entry less 
than half the pivot. Working on the GPS problem in Delft, Peter Teunissen and his group 
found that one further twist is important. The pivots of ZTA Z should come out ordered in 
size. Then this preconditioned problem will have a relatively small number of candidates y, 
and a direct search will find the best one. (A good reference is GPS for Geodesy, edited by 
Kleusberg and Teunissen (1996). Our book and web page provide MATLAB code for this 
“LAMBDA method” and a fresh treatment of the Kalman filter for processing the data.) A 
potentially difficult problem has been made tractable for most GPS applications. 

Shall we try to predict the future? The number of users will continue to increase. The 
24 dedicated satellites will gradually be upgraded. Probably a new civilian frequency will 
be added. (The military might remove their frequency L2. They will certainly not reveal 
the top secret that transforms P code into a secure Y code.) In ordinary life, applications 
to navigation are certain to be enormous. Ships and planes and cars will have GPS built in. 
Maybe the most important prediction is the simplest to state: You will eventually get one. 

It is nice to think that applied mathematics has a part in this useful step forward. 
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Linear Algebra 


VECTORS AND MATRICES 


The heart of linear algebra is in two operations—both with vectors. Chapter 1 explains 
these central ideas, on which everything builds. We start with two-dimensional vectors and 
three-dimensional vectors, which are reasonable to draw. Then we move into higher di- 
mensions. The really impressive feature of linear algebra is how smoothly it takes that step 
into n-dimensional space. Your mental picture stays completely correct, even if drawing a 
ten-dimensional vector is impossible. 

You can add vectors (v + w). You can multiply a vector v by a number (cv). This is 
where the book is going, and the first steps are the two operations in Sections 1.1 and 1.2: 


1.1 Linear combinations cv + dw. 


1.2 The dot product v » w. 


Then a matrix appears, with vectors in its columns and vectors in its rows. We take linear 
combinations of the columns. We form dot products with the rows. And one step further— 
there will be linear equations. The equations involve an unknown vector called x, and they 
come into Sections 1.3 and 1.4: 


1.3 The dot product is in the equation for a plane. 


1.4 Linear combinations are in the all-important system Ax = b. 


Those equations are algebra (linear algebra). Behind them lies geometry (linear geome- 
try!). A system of equations is the same thing as an intersection of planes (in n dimensions). 
At the same time, the equation Ax = b looks for the combination of column vectors that 
produces b. 


These are the two ways to look at linear algebra. They are the “row picture” and the 
“column picture.” Chapter 1 makes a direct start on them both. 


1.1 Vectors 


“You can’t add apples and oranges.” That sentence might not be news, but it still contains 
some truth. In a strange way, it is the reason for vectors! If we keep the number of 
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apples separate from the number of oranges, we have a pair of numbers. That pair is a 
two-dimensional vector v:. 


v1 vı = number of apples 
v2 v2 = number of oranges. 


We wrote v as a column vector. The numbers vı and v2 are its “components.” The main 
point so far is to have a single letter v (in boldface) for this pair of numbers (in lightface). 

Even if we don’t add vı to v2, we do add vectors. The first components of v and w 
stay separate from the second components: 


v w vy + w 
| and w=| add to v+w= fni i 
v2 w2 v2 + W2 
You see the reason. The total number of apples is vj + w1. The number of oranges is 
v2 + w2. Vector addition is basic and important. Subtraction of vectors follows the same 
idea: The components of v— w are and _____ 


Vectors can be multiplied by 2 or by —1 or by any number c. There are two ways to 
double a vector. One way is to add v + v. The other way (the usual way) is to multiply 


each component by 2: 
20 = Pad and —v= Es l 
2v2 —v2 


The components of cv are cv; and cv2. The number c is called a 

Notice that the sum of —v and v is the zero vector. This is 0, which is not the same 
as the number zero! The vector 0 has components 0 and 0. Forgive us for hammering away 
at the difference between a vector and its components. Linear algebra is built on these 
operations v + w and cvu—adding vectors and multiplying by scalars. 

There is another way to see a vector, that shows all its components at once. The 
vector v can be represented by an arrow. When v has two components, the arrow is in 
two-dimensional space (a plane). If the components are vı and v2, the arrow goes vı units 
to the right and v2 units up. This vector is drawn twice in Figure 1.1. First, it starts at the 
origin (where the axes meet). This is the usual picture. Unless there is a special reason, 


Figure 1.1 The arrow usually starts at the origin; cv is always parallel to v. 
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Figure 1.2 Vector addition v + w produces a parallelogram. Add the components sepa- 
rately. 


our vectors will begin at (0, 0). But the second arrow shows the starting point shifted over 
to A. The arrows OP and AB represent the same vector. One reason for allowing all 
starting points is to visualize the sum v + w: 


Vector addition (head to tail) At the end of v, place the start of w. 


In Figure 1.2, v + w goes from the beginning of v to the end of w. 


We travel along v and then along w. Or we take the shortcut along v + w. We could also 
go along w and then v. In other words, w + v gives the same answer as v + w. These are 
different ways along the parallelogram (in this example it 1s a rectangle). The endpoint is 
still the diagonal v + w: 

Check that by algebra: The first component is vı + wı which equals w; + vı. The 
order makes no difference, and v + w = w + v: 


blebl=l] = Bll- i] 


The zero vector has v; = 0 and v2 = 0. It is too short to draw a decent arrow, but you 
know that v + 0 = v. For 2v we double the length of the arrow. We reverse its direction 
for —v. This reversing gives a geometric way to subtract vectors. 


Vector subtraction To draw v — w, go forward along v and then backward along w 
(Figure 1.3). The components are vy — wy and v2 — w2. 


We will soon meet a “product” of vectors. It is not the vector whose components are vj w1 
and v2w. 


Linear Combinations 


We have added vectors, subtracted vectors, and multiplied by scalars. The answers v + w, 
v — w, and cv are computed a component at a time. By combining these operations, we 
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Figure 1.3 Vector subtraction v — w. The linear combination 3v + 2w. 


now form “linear combinations” of v and w. Apples still stay separate from oranges—the 
combination is a new vector cv + dw. 


Definition The sum of cv and dw is a linear combination of v and w. Figure 1.3b shows 


the combination 3v + 2w: 
2 1 8 
ft} +2[i]= [La]: 


This is the fundamental construction of‘linear algebra: multiply and add. The sum 
v + w is a Special combination, when c = d = 1. The multiple 2v is the particular case 
with c = 2 and d = 0. Soon you will be looking at all linear combinations of v and w—a 
whole family of vectors at once. It is this big view, going from two vectors to a “plane of 
vectors,” that makes the subject work. 

In the forward direction, a combination of v and w is supremely easy. We are given 
the multipliers c = 3 and d = 2, so we multiply. Then add 3v + 2w. The serious problem 
is the opposite question, when c and d are “unknowns.” In that case we are only given the 
answer cv + dw (with components 8 and —1). We look for the right multipliers c and d. 
The two components give two equations in these two unknowns. 

When 100 unknowns multiply 100 vectors each with 100 components, the best way 
to find those unknowns is explained in Chapter 2. 


Vectors in Three Dimensions 


A vector with two components corresponds to a point in the xy plane. The components of 
the vector are the coordinates of the point: x = vı and y = v2. The arrow ends at this 
point (v1, v2), when it starts from (0, 0). Now we allow vectors to have three components. 
The xy plane is replaced by three-dimensional space. 

Here are typical vectors (still column vectors but with three components): 


2 
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y Zz 
(3, 2) (12; 2). 1 
2 Dr 
2 l 2 
z 
] 
X ] y 
| 2 
Z 


Figure 1.4 Vectors [>] and [>] correspond to points (x, y) and (x, y, Z). 


The vector v corresponds to an arrow in 3-space. Usually the arrow starts at the origin, 
where the xyz axes meet and the coordinates are (0, 0, 0). The arrow ends at the point with 
coordinates x = 1, y = 2, z = 2. There is a perfect match between the column vector 
and the arrow from the origin and the point where the arrow ends. Those are three ways to 
describe the same vector: 


1 
From now on v= | 2 is also written as v = (1, 2,2). 
2 


‘The reason for the first form (in'a column) is to fit next to a matrix. The reason for the 
second form is to save space. This becomes essential for vectors with many components. 
To print (1, 2, 2, 4, 4, 6) in a column would waste the environment. 


Important note (1, 2, 2) is not å row vector. The row vector [1 2 2] is absolutely dif- 
ferent, even though it has the same three components. A column vector can be printed 
vertically (with brackets). It can also be printed horizontally (with commas and parenthe- 
ses). Thus (1, 2, 2) is in actuality a column vector. It is just temporarily lying down. 


In three dimensions, vector addition is still done a component at a time. The result 
v + w has components vı + w1 and v2 + w2 and v3 + w3—maybe apples, oranges, and 
pears. You see already how to add vectors in 4 or 5 or n dimensions—this is the end of 
linear algebra for groceries. 

The addition v + w is represented by arrows in space. When w starts at the end 
of v, the third side is v + w. When w follows v, we get the other sides of a parallelogram. 
Question: Do the four sides all lie in the same plane? Yes. And the sum v + w — v — w 
goes around the parallelogram to produce ____ 


Summary This first section of the book explained vector addition v + w and scalar multi- 
plication cv. Then we moved to other linear combinations cv +dw. A typical combination 
of three vectors in three dimensions is u + 4v — 2w: 


1 2 
0|+412]|—2| 3) = 
3 i =l 


O Ne 
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We end with this question: What surface do you get from all the linear combinations of u 
and v? The surface includes the line through u and the line through v. It includes the zero 
vector (which is the combination Ou + Ov). The surface also includes the diagonal line 
through v + w—and every other combination cv + dw. This whole surface is a plane. 


Note on Computing Suppose the components of v are v(1),..., v(N), and similarly 
for w. In a language like FORTRAN, the sum v + w requires a loop to add components 
separately: 


DO 101 = 1,N 
10 VPLUSW(I) = vil) + wil) 


MATLAB works directly with vectors and matrices. When v and w have been de- 
fined, v + w is immediately understood. It is printed unless the line ends in a semicolon. 
Input two specific vectors as rows—the prime ’ at the end changes them to columns. Then 
print v + w and another linear combination: 


v= [23 4)’; 
w =[11 1]; 
U=V+tWw 
2*V-3*W 
The sum will print with “u = ”. The unnamed combination prints with “ans = ”: 
Uu = ans = 
3 1 
4 
5 5 


Problem Set 1.1 


Problems 1-9 are about addition of vectors and linear combinations. 

1 Draw the vectors v = [ $ ] and w = [73] and v + w and v — w in a single xy plane. 
2 Ifv+w = [}]and v -— w = [1], compute and draw v and w. 

3 Fromv = |} ] and w = [1 ], find the components of 3v +w and v—3w and cv+dw. 


4 Compute u + v and u + v + w and 2u + 2v + w when 


1 —3 2 
w= (2); v= 1}, w=]-3 
3 —2 —] 


5 (a) Every multiple of v = (1, —2, 1) has components that add up to 


(b) Every linear combination of v and w = (0, 1, —1) has components that add to 


(c) Find c and d so that cv + dw = (4, 2, —6). 
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6 In the xy plane mark these nine linear combinations: 


CHEZIH with c=0,1,2 and d=0,1,2. 


7 (a) The subtraction v — w goes forward along v and backward on w. Figure 1.3 
also shows a second route to v — w. What is it? 
(b) If you look at all combinations of v and w, what “surface of vectors” do you 


see? 


8 The parallelogram in Figure 1.2 has diagonal v + w. What is its other diagonal? 
What is the sum of the two diagonals? Draw that vector sum. 


9 If three corners of a parallelogram are (1,1), (4,2), and (1,3), what are all the 
possible fourth corners? Draw two of them. 
Problems 10-13 involve the length of vectors. Compute (length of v)? as v? + v2. 


10 The parallelogram with sides v = (4,2) and w = (—1, 2) looks like a rectangle 
(Figure 1.2). Check the Pythagoras formula a? +b? = c? which is for right triangles 
only: 

(length of v)? + (length of w)? = (length of v + w)?. 


11 Inthis 90° case, a? + b? = c? also works for v — w. Check that 
(length of v)? + (length of w)? = (length of v — w)?. 
Give an example of v and w for which this formula fails. 


12 To emphasize that right triangles are special, construct v and w and v + w without a 
90° angle. Compare (length of v)? + (length of w)? with (length of v + w)?. 


13 In Figure 1.2 check that (length of v) + (length of w) is larger than (length of v + w). 
This is true for every triangle, except the very thin triangle when v and w are ____ 
Notice that these lengths are not squared. 


Problems 14—18 are about special vectors on cubes and clocks. 
14 Copy the cube and draw the vector sum of i = (1,0,0) and j = (0, 1,0) and 
k = (0, 0, 1). The addition i + j yields the diagonal of 


15 Three edges of the unit cube are i, j,k. Its eight corners are (0, 0,0), (1,0, 0), 
(0, 1, 0), . What are the coordinates of the center point? The center points of 
the six faces are 


16 How many corners does a cube have in 4 dimensions? How many faces? How many 
edges? A typical corner is (0, 0, 1, 0). 


17 (a) What is the sum V of the twelve vectors that go from the center of a clock to 
the hours 1:00, 2:00, ..., 12:00? 
(b) If the vector to 4:00 is removed, find the sum of the eleven remaining vectors. 


(c) Suppose the 1:00 vector is cut in half. Add it to the other eleven vectors. 


10 


18 


(1, 0, 0) 
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(0, 0, 1) 
(0, 1, 0) 
4:00 
Suppose the twelve vectors start from the top of the clock (at 12:00) instead of the 


center. The vector to 6:00 is doubled and the vector to 12:00 is now zero. Add the 
new twelve vectors. 


Problems 19-27 go further with linear combinations of v and w (and u). 


19 


20 


21 


22 


23 


24 


25 


26 


27 


The first figure shows u = jv + jw. This is halfway between v and w because 
v— u = 5 ( ). Mark the points zu + iw and rv + fw. 


Mark the point —v + 2w and one other combination cv + dw withc +d = 1. Draw 
the line of all these “affine” combinations that have c + d = 1. 


Locate sv+5w and Zv+ $ w. The combinations cv+cw fill out what line? Restricted 
by c > 0 those combinations with c = d fill what ray? 


(a) Mark Sv + w and v + jw. Restricted by 0 < c < 1 and 0 < d < 1, shade in 
all combinations cv + dw. 
(b) Restricted only by 0 < c and 0 < d draw the “cone” of all combinations. 


The second figure shows vectors u, v, w in three-dimensional space. 


(a) Locate zu + xv + zw and ju + jw. 
(b) Challenge problem: Under what restrictions on c, d, e, and c + d + e will the 
points cu + dv + ew fill in the dotted triangle with corners u, v, w? 


The three sides of the dotted triangle are v — u and w — v and u — w. Their sum is 
. Draw the head-to-tail addition around a plane triangle of (3, 1) plus (—1, 1) 
plus (—2, —2). 


Shade in the pyramid of combinations cu + dv + ew with c > 0, d > 0, e > 0 and 
c+d +e < 1. Mark the vector 5 (u +v+ w) as inside or outside this pyramid. 


In 3-dimensional space, which vectors are combinations of u,v, and w? Which 
vectors are in the plane of u and v, and also in the plane of v and w? 


(a) Choose u, v, w so that their combinations fill only a line. 


(b) Choose u, v, w so that their combinations fill only a plane. 
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Problems 19—22 Problems 23—26 


1.2 Lengths and Dot Products 


The first section mentioned multiplication of vectors, but it backed off. Now we go forward 
to define the “dot product” of v and w. This multiplication involves the separate products 
vıwı and v2w2, but it doesn’t stop there. Those two numbers are added to produce the 
single number v - w. 


Definition The dot product or inner product of v = (vi, v2) and w = (w1, w2) is the 
number 


ve Ww = vw Huw. (1.1) 


Example 1.1 The vectors v = (4, 2) and w = (—1, 2) happen to have a zero dot product: 


JE 


In mathematics, zero is always a special number. For dot products, it means that these vec- 
tors are perpendicular. The angle between them is 90°. When we drew them in Figure 1.2, 
we saw a rectangle (not just a parallelogram). The clearest example of perpendicular vec- 
tors is i = (1,0) along the x axis and j = (0, 1) up the y axis. Again the dot product is 
i . j = 0 +0 = 0. Those vectors form a right angle. 

The vectors v = (1, 2) and w = (2, 1) are not perpendicular. Their dot product is 4. 
Soon that will reveal the angle between them (not 90°). 


Example 1.2 Put a weight of 4 at the point x = —1 and a weight of 2 at the point x = 2. 
If the x axis is a see-saw, it will balance on the center point x = 0. The weight at x = —1 
balances the child at x = 2. They balance because the dot product is (4)(—1) + (2) (2) = 0. 


This example is typical of engineering and science. The vector of weights is 
(w1, w2) = (4,2). The vector of distances from the center is (vj, v2) = (—1, 2). The 
“moment” of the first weight is wı times v1, force times distance. The moment of the 
second weight is w2v2. The total moment is the dot product wiv + w20v2. 
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First comment The dot product w - v equals v - w. The order of v and w makes no 
difference. 


Second comment The “vector w” is just a list of weights. The “vector v” is just a list 
of distances. With a giant child sitting at the center point, the weight vector would be 
(4, 2, 100). The distance vector would be (—1, 2, 0). The dot past would still be zero 
and the see-saw would still balance: 


4 =| 
2 |+} 2] =(86C1 + (22) + (100)(0) = 0. 
100 0 


With six weights, the vectors will be six-dimensional. This is how linear algebra goes 
quickly into high dimensions. The x axis or the see-saw is only one-dimensional, but you 
have to make that leap to a vector of six weights. 


Example 1.3 Dot products enter in economics and business. We have five products to buy 
and sell. The unit prices are (p1, p2, P3, p4, ps)—the “price vector.” The quantities we 
buy or sell are (q1, 92, 93, 94, g5)—positive when we sell, negative when we buy. Selling 
the quantity qı at the price pı brings in qı pı. The total income is the dot product q - p: 


Income = (q1, q2, .-., q5) * (P1, P2,---, P5) = q1 pı + q2p2 + -+ + q5P5. 


A zero dot product means that “the books balance.” Total sales equal total purchases if 
q - p = 0. Then p is perpendicular to q (in five-dimensional space). 

Small note: Spreadsheets have become essential computer software in management. 
What does a spreadsheet actually do? It computes linear combinations and dot products. 
What you see on the screen is a matrix. 


Question: How are linear combinations related to dot products? Answer: Look at the 
first component of 3v — 2w. 


4 1 10 
A typical linear combinationis 3] 0]—2] 2 |= |—4 
4 3 6 


The number 10 is 3 times 4 minus 2 times 1. Think of that as (3, —2) - (4, 1) = 10. This 
is a dot product. The numbers —4 and 6 are also dot products. When vectors have m 
components, a linear combination uses m dot products. 


Main point To compute the dot product, multiply each v; times w;. Then add. 


Lengths and Unit Vectors 


An important case is the dot product of a vector with itself. In this case v = w. When the 
vector is v = (1, 2, 3), the dot product is v-v = 14: 


1 1 


2)-|2)=1444+9=14. 
3 3 
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The answer is not zero because v is not perpendicular to itself. Instead of a 90° angle we 
have 0°. In this special case, the dot product v - v gives the length squared. 


The length (or norm) of a vector v is the square root of v » v: 


length = ||v|| = J/v- v. 


In two dimensions the length is ||v|| = Vue + Us: In three dimensions it is ||v|| = 
V v? + v? F vs. By the calculation above, the length of v = (1, 2, 3) is ||v|| = ~v 14. 


Definition 


We can explain this definition. ||v|| is just the ordinary length of the arrow that 
represents the vector. In two dimensions, the arrow is in a plane. If the components are 1 
and 2, the arrow is the third side of a right triangle (Figure 1.5). The formula a? +b? = c?, 
which connects the three sides, is 1? + 2” = ||v||. That formula gives ||v|| = /5. Lengths 
are always positive, except for the zero vector with ||0|| = 0. 

For the length of v = (1, 2, 3), we used the right triangle formula twice. First, the 
vector in the base has components 1, 2, 0 and length /5. This base vector is perpendicular 
to the vector (0, 0, 3) that goes straight up. So the diagonal of the box has length ||v|| = 
V5 +9 = v14. 

The length of a four-dimensional vector would be Vu? + vs + vs + ie Thus 
(1,1, 1, 1) has length V12 + 12+ 12+ 12 = 2. This is the diagonal through a unit cube 
in four-dimensional space. The diagonal in three dimensions has length /3. 

The word “unit” is always indicating that some measurement equals “one.” The unit 
price is the price for one item. A unit cube has sides of length one. A unit circle is a circle 
with radius one. Now we define the idea of a “unit vector.” 


Definition A unit vector u is a vector whose length equals one. Then u » u = 1. 
, . , , 1111 T Care eae ee 
An example in four dimensions is u = (5,555) 5)- Then u-uis 3 + A ag aT We 


divided the vector v = (1, 1, 1, 1) by its length ||v|] = 2 to get the unit vector. 


(0, 0, 3) 


a 
(1, 2 3) has length V14 
| 
| 
| 
| 
| 


(0, 2, 0) 


XN 
\, | 
CEE or (1, 2, 0) has length V5 


Figure 1.5 The length of two-dimensional and three-dimensional vectors. 
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J = (0, 1) 
i+ j=(1,1) 


u = (1, 1)/V2 


—i i=(1,0) 


-j 


Figure 1.6 The coordinate vectors i and j. The unit vector u at angle 45° and the unit 
vector at angle 0. 


Example 1.4 The standard unit vectors along the x and y axes are written i and j. In the 
xy plane, the unit vector that makes an angle “theta” with the x axis is u: 


; 1 f 0 cos 0 
e and s=(4] and a=| cel: 


When 6 = 0, the vector u is the same as i. When 0 = 90° (or 5 radians), u is the 
same as j. At any angle, the components cos@ and sin produce u - u = 1 because 
cos? 6 + sin*@ = 1. These vectors reach out to the unit circle in Figure 1.6. Thus cos 8 
and sin 0 are simply the coordinates of that point at angle 0 on the unit circle. 


In three dimensions, the unit vectors along the axes are i, j, and k. Their components are 
(1, 0, 0) and (0, 1, 0) and (0, 0, 1). Notice how every three-dimensional vector is a linear 
combination of i, j, and k. The vector v = (2, 2, 1) is equal to 2i + 27 + k. Its length is 
/22 + 22 + 12. This is the square root of 9, so ||v|| = 3. 

Since (2, 2, 1) has length 3, the vector G, $, x) has length 1. To create a unit vector, 
just divide v by its length || v||. 


1A Unit vectors Divide any nonzero vector v by its length. The result is u = v/||v|l. 
This is a unit vector in the same direction as v. 

Figure 1.6 shows the components of i + j = (1, 1) divided by the length /2. Then 
u? = 5 and uy = 5 and |u]? = 5 + 4 = 1. When we divide the vector by ||v||, we divide 
its length by ||v||. This leaves a unit length ||u|| = 1. 

In three dimensions we found u = (3, $, 5). Check thatu-u = s =f s T it i Us 
Then u reaches out to the “unit sphere” centered at the origin. Unit vectors correspond to 
points on the sphere of radius one. 


The Angle Between Two Vectors 


We stated that perpendicular vectors have v - w = 0. The dot product is zero when the 
angle is 90°. To give a reason, we have to connect right angles to dot products. Then we 
show how use v  w to find the angle between two nonzero vectors, perpendicular or not. 
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> angle below 90° in this half 


ee V25 _[4 
j ah ya v= v 
v-w=0 


Figure 1.7 Perpendicular vectors have v - w = 0. The angle @ is below 90° when 
v-w> 0. 


1B Right angles The dot product v - w is zero when v is perpendicular to w. 


Proof When v and w are perpendicular, they form two sides of a right triangle. The third 
side (the hypotenuse going across in Figure 1.7) is v — w. So the law a? + b? = c? for 
their lengths becomes 


lvl? + lwl? = |v — w|? (for perpendicular vectors only). (1.2) 
Writing out the formulas for those lengths (in two dimensions), this equation is 
ue + us + we + w5 = (vy — wi)? + (v2 — Ww)’. (1.3) 


The right side begins with A — 2vjw, + wy. Then ve and we are on both sides of the 
equation and they cancel. Similarly (v2 — w2)* contains v5 and w3 and the cross term 
—2v2w2. All terms cancel except —2vı w1 and —2v2w2. (In three dimensions there would 
also be —2v3 w3.) The last step is to divide by —2: 


0 = —2viwı — 2v2w2 which leads to viw; + vw: = 0. (1.4) 


Conclusion Right angles produce v » w = 0. We have proved Theorem 1B. The dot 
product is zero when the angle is 0 = 90°. Then cos 0 = 0. 

The zero vector v = 0 is perpendicular to every vector w because 0 + w is always 
Zero. 

Now suppose v - w is not zero. It may be positive, it may be negative. The sign of 
v» w immediately tells whether we are below or above a right angle. The angle is less 
than 90° when v - w is positive. The angle is above 90° when v » w is negative. Figure 1.7 
shows a typical vector w = (1, 3) in the white half-plane, with v - w > 0. The vector 
W = (—2, 0) in the screened half-plane has v - W < 0. 

The vectors we drew have v- w = 10 and v- W = —8. The borderline is where 
vectors are perpendicular to v. On that dividing line between plus and minus, the dot 
product is zero. 


The next page takes one more step with the geometry of dot products. We find the exact 
angle 0. This is not necessary for linear algebra—you could stop here! Once we have 
matrices and linear equations, we won’t come back to 6. But while we are on the subject 
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Figure 1.8 The dot product of unit vectors is the cosine of the angle 0. 


of angles, this is the place for the formula. It will show that the angle 0 in Figure 1.7 is 
exactly 45°. 

Start with unit vectors u and U. The sign of u - U tells whether 0 < 90° or 0 > 90°. 
Because the vectors have length 1, we learn more than that. The dot product u + U is the 
cosine of 0. 


1C (a) Ifu and U are unit vectors then u - U = cos0. 


(b) Ifu and U are unit vectors then |u -U| < 1. 


Statement (b) follows directly from (a). Remember that cos @ is never greater than 1. It is 
never less than —1. So we discover an extra fact about unit vectors. Their dot product is 
between —1 and 1. 


Figure 1.8 shows this clearly when the vectors are (cos 0, sin@) and (1, 0). The dot product 
of those vectors is cos 0. That is the cosine of the angle between them. 

After rotation through any angle œ, these are still unit vectors. The angle between 
them is still 6. The vectors are u = (cosa, sina) and U = (cos £, sin 8). Their dot 
product is cos œ cos $ + sina sin $. From trigonometry this is the same as cos(B — œ). 
Since B — «œ equals 0 we have reached the formula u - U = cos. Then the fact that 
| cos @| < 1 tells us immediately that |u -U| < 1. 

Problem 24 proves the inequality |u-U| < 1 directly, without mentioning angles. 
Problem 22 applies another formula from trigonometry, the “Law of Cosines.” The in- 
equality and the cosine formula are true in any number of dimensions. The dot product 
does not change when vectors are rotated—the angle between them stays the same. 

What if v and w are not unit vectors? Their dot product is not generally cos 0. We 
have to divide by their lengths to get unit vectors u = v/||v|| and U = w/||w||. Then the 
dot product of those rescaled vectors u and U gives cos 0. 

Whatever the angle, this dot product of v/||v|| with w/||w|| never exceeds one. That 
is the “Schwarz inequality” for dot products—or more correctly the Cauchy-Schwarz- 
Buniakowsky inequality. It was found in France and Germany and Russia (and maybe 
elsewhere—it is the most important inequality in mathematics). With the extra factor 
||v || || w|| from rescaling to unit vectors, we have cos 8: 
1D (a) COSINE FORMULA Ifv and w are nonzero vectors then yer = cosé. 

(b) SCHWARZ INEQUALITY Ifv and w are any vectors then |v - w| < |{v|| liw |l. 
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Example 1.5 Find the angle between v = [| $ | and w = | } ] in Figure 1.7b. 
Solution The dot product is v - w = 10. The length of v is ||v|| = v20. The length of w 
is ||w|| = v 10. Therefore the cosine of the angle is 

vew l0 — 1 
lol wl 20/10 V2 
The angle that has this cosine is 45°. It is below 90° because v - w = 10 is positive. Of 
course ||v|| || w|]| = v 200 is larger than 10. 


cos @ = 


Example 1.6 For v = (a,b) and w = (b,a) the Schwarz inequality says that 2ab < 
a? + b*. For example 2(3)(4) = 24 is less than 3? +4? = 95. 


The dot product of (a, b) and (b, a) is 2ab. The lengths are ||v|| = ||w|| = va? + b?. 
Then v - w = 2ab never exceeds ||v|| || w|| = a? + b?. The difference between them can 


never be negative: 
a’ +b? —2ab = (a — by’ > 0. 


This is more famous if we write x = a? and y = b*. Then the “geometric mean” ./xy is 
not larger than the “arithmetic mean,” which is the average of x and y: 


2 b2 
ab < ; becomes ./xy < Lia 


Computing dot products and lengths and angles It is time for a moment of truth. The 
dot product v - w is usually seen as a row times a column: 


1 3 3 
Instead of B : P we more often see [1 2 ] a 


In FORTRAN we multiply components and add (using a loop): 


DO 101 = 1,N 
10 VDOTW = VDOTW + Vil) * Wil) 


MATLAB works with whole vectors, not their components. If v and w are column vectors 
then v’ is arow as above: 


dot = v’ * w 


The length of v is already known to MATLAB as norm(v). We could define it ourselves 
as sqrt(v’ * v), using the square root function—also known. The cosine and the angle (in 
radians) we do have to define ourselves: 


cos = v’ * w/(norm(v) * norm(w)); 
angle = acos(cos) 


We used the arc cosine (acos) function to find the angle from its cosine. We have not cre- 
ated a new function cos(v,w) for future use. That would become an M-file, and Chapter 2 
will show its format. (Quite a few M-files have been created especially for this book. They 
are listed at the end.) The instructions above will cause the numbers dot and angle to be 
printed. The cosine will not be printed because of the semicolon. 
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Problem Set 1.2 


Calculate the dot products u- v andu- w and v - w andw-v: 


eee —.6 = 3 ee 4 
| 8 14 3 
Compute the lengths ||u]| and ||v|| and ||w|| of those vectors. Check the Schwarz 


inequalities |u + v| < |ju| ||v|| and |v - w| < |v] Ilw]. 


Write down unit vectors in the directions of v and w in Problem 1. Find the cosine 
of the angle 0 between them. 


Find unit vectors u; and u> that are parallel to v = (3, 1) and w = (2,1, 2). Also 
find unit vectors U; and U2 that are perpendicular to v and w. 


For any unit vectors v and w, find the angle 0 between 
(a) vanduv (b) wand—w (c) v+wandv—w. 


Find the angle 0 (from its cosine) between 


2 2 

1 1 
(a) »=| | and w=|), (b) v= 2 and w= |—l1 
V3 0 =“ z 


(c) =| j and w= 5] (d) v= and w=] 


Describe all vectors that are perpendicular to v = (2, —1). Repeat for V = (1, 1, 1). 
True or false (give a reason if true or a counterexample if false): 


(a) Ifu is perpendicular (in three dimensions) to v and w, then v and w are parallel. 


(b) If u is perpendicular to v and w, then u is perpendicular to every combination 
cv + dw. 


(c) There is always a nonzero combination cv + dw that is perpendicular to u. 


The slopes of the arrows from (0, 0) to (v1, v2) and (w1, w2) are v2/vı and w2/w1. 
If the product of those slopes is v2w2/vjw, = —1, show that v - w = O and the 
vectors are perpendicular. 


Draw arrows from (0, 0) to the points v = (1, 2) and w = (—2, 1). Write down the 
two slopes and multiply them. That answer is a signal that v - w = 0 and the arrows 
are 


If v - w is negative, what does this say about the angle between v and w? Draw a 
3-dimensional vector v (an arrow), and show where to find all w’s with v - w < 0. 


With v = (1, 1) and w = (1, 5) choose a number c so that w — cv is perpendicular 
to v. Then find the formula that gives this number c for any v and w. 
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15 


16 


17 
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w = (w, W3) 


v+Ww 


v= (vj, V3) 


Find two vectors v and w that are perpendicular to (1, 1, 1) and to each other. 
Find three vectors u, v, w that are perpendicular to (1, 1, 1, 1) and to each other. 


The geometric mean of x = 2 and y = 8 is ./xy = 4. The arithmetic mean is larger: 
5 (x +y) = . This came in Example 1.6 from the Schwarz inequality for 
v= (/2, J/8) and w = (/8, 2). Find cos @ for this v and w. 


How long is the vector v = (1, 1,..., 1) in 9 dimensions? Find a unit vector u in 
the same direction as v and a vector w that is perpendicular to v. 


What are the cosines of the angles a, B, 0 between the vector (1, 0, —1) and the unit 
vectors i, J, k along the axes? Check the formula cos? a + cos” B+ cos? 0 = 1. 


Problems 18-24 lead to the main facts about lengths and angles. Never again will we 
have several proofs in a row. 


18 


19 


20 


21 


(Rules for dot products) These equations are simple but useful: (1) v- w = w-v 
(2hu-(v+w)=u-v+u-w (3)(cv)-w=c(v- w) 
Use Rules (1-2) with u = v + w to prove that ||v + w||7=v-v+2v-wt+w-w. 


(The triangle inequality: length of v + w < length of v+length of w) Problem 18 
found ||v + wl? = |lv \| +2v-w+||w \|*. Show how the Schwarz inequality leads to 


lo + wi]? < (ol + wl)? or jv + wli < hol + lwl. 


(The perpendicularity test v - w = 0 in three dimensions) 


(a) The right triangle still obeys loll? + lw]? = |v — w||2. Show how this leads 
as in equations (1-3) to viw + v2w2 + v3w3 = Q. 


(b) Is it also true that |u|]? + w|? = ||v + w||?? Draw this right triangle too. 


(Cosine of 6 from the formula cos(6 — a) = cos Bcosa@ + sin £ sing) 

The figure shows that cos æ = v1/||v|| and sina = v2/||v||. Similarly cos Bis _ 
andsinBis __.. The angle 0 is B — œ. Substitute into the formula for cos(B — æ) 
to find cos 0 = v- w/||v|| || wl. 
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22 (The formula for cos 0 from the Law of Cosines) 
With v and w at angle 0, the Law of Cosines gives the length of the third side: 


2 2 2 
lo — wil" = lolt — 2]]v]| lwl cos @ + [wi]. 


Compare with (v — w)-(v—w) = to find the formula for cos 8. 
23 (The Schwarz inequality by algebra instead of trigonometry) 


(a) Multiply out both sides of (viw + v2w2)* < (v? + v2) (w? + ws). 
(b) Show that the difference between those sides equals (viw — vw). This 
cannot be negative since it is a square—so the inequality is true. 


24 (One-line proof of the Schwarz inequality |u -U| < 1) 
If (u1, u2) and (U1, U2) are unit vectors, pick out the step that uses Example 1.6: 
u +U? Wh tUP 141 
2 E 
Put (u1, u2) = (.6, .8) and (U1, U2) = (.8, .6) in that whole line and find cos 8. 


lu - U| < |u1| [U1] + |u2| |U2| < 1. 


25 Why is | cos @| never greater than 1 in the first place? 


26 Pick any numbers that add to x + y +z = 0. Find the angle between your vec- 
tor v = (x, y,z) and the vector w = (z,x, y). Challenge question: Explain why 
v- w/||v||||w]| is always —5. 


1.3 Planes 


The first sections mostly discussed two-dimensional vectors. But everything in those 
sections—dot products, lengths, angles, and linear combinations cv + dw—follows the 
same pattern when vectors have three components. The ideas also apply in 4 and 5 and 
n dimensions. We start with ordinary “3-space,” and our goal is to find the equation of a 
plane. 

One form of the equation involves dot products. The other form involves linear 
combinations. The dot products and the combinations now appear in equations. The 
equations are the new feature—and planes are the perfect examples. 

Imagine a plane (like a wall of your room). Many vectors are parallel to that plane. 
But only one direction is perpendicular to the plane. This perpendicular direction is the 
best way to describe the plane. The vectors parallel to the plane are identified by a vector n 
to which they are all perpendicular. 

“Perpendicular” means the same as “orthogonal.” Another word is “normal,” and its 
first letter is the reason we use n. You can visualize n pointing straight out from the plane, 
with any nonzero length. If n is a normal vector, so is 2n. So is —n. 


Start with a plane through the origin (0, 0, 0). If the point (x, y, z) lies on the plane, then 
the vector v to that point is perpendicular to n. The vector v has components (x, y, z). The 
normal vector n has components (a, b, c). The geometry of “perpendicular” converts to 
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the algebra of “dot product equals zero.” That is exactly the equation we want: The plane 
through (0, 0, 0) perpendicular to n = (a, b, c) has the linear equation 


n-v=0 or ax+by+cz=0. (1.5) 


This is the first plane in Figure 1.9—it goes through the origin. Now look at other planes, 
all perpendicular to the same vector n. They are parallel to the first plane, but they don’t 
go through (0, 0, 0). 

Suppose the particular point (xo, yo, Zo) lies in the plane. Other points in the plane 
have other coordinates (x, y, z). Not all points are allowed—that would fill the whole 
space. The normal vector is n = (a, b,c), the vector to (x0, yo, zo) is vo, the vector to 
(x, y,Z) 1s v. Look at the vector v — vo in Figure 1.9. It is parallel to the plane, which 
makes it perpendicular to n. The dot product of n with v — vo produces our equation. 


1E The plane that passes through vo = (xo, yo, Zo) perpendicular to n has the linear equa- 
tion 


n+(v—v9) =0 or a(x — xo) + b(y — yo) + c(z — zo) = Q. (1.6) 


The equation expresses two facts of geometry. The plane goes through the given point 
(xo, yo, Zo) and it is perpendicular to n. Remember that x, y, z are variables—they describe 
all points in the plane. The numbers xo, yo, Zo are constants—they describe one point. One 
possible choice of x, y, z is xo, yo, zo. Then v = vo satisfies the equation and is a point on 
the plane. 

To emphasize the difference between v and vo, move the constants in vo to the other 
side. The normal vector n is still (a, b, c): 


1F Every plane perpendicular to n has a linear equation with coefficients a, b, c: 
ax + by + cz = axo + byo + czo or ax +by+cz=d. (1.7) 


Different values of d give parallel planes. The value d = 0 gives a plane through the origin. 


Example 1.7 The plane x + 2y + 3z = 0 goes through the origin. Check: x = 0, y = 0, 
z = 0 satisfies the equation. The normal vector is n = (1, 2, 3). Another normal vector is 


Plane through (0, 0, 0) 
ay: ee 0 


Plane through vp 
ae ne (v-v) =0 


Figure 1.9 The planes perpendicular to n aren - v = 0 and n » (v — vo) = 0 (which is 
n » v = d and also ax + by + cz = d). 
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n = (2,4, 6). The equation 2x + 4y + 6z = 0 produces the same plane. Two points in this 
plane are (x, y, z) = (2, —1, 0) and (x, y, z) = 


Example 1.8 Keep the same normal vector (1,2,3). Find the parallel plane through 
(xo, yo, Zo) = (1, 1, 1). 


Solution The equation is still x + 2y + 3z = d. Choose d = 6, so the point x = 1, y = 1, 
z = | satisfies the equation. This number 6 is the same as axp + byo + cz = 1 +2 +23. 
The number d is always n = vo. 


Example 1.9 The “xy plane” has the equation z = 0. Its normal vector is (0, 0, 1)— 
this is the standard unit vector k, pointing straight up. A typical point on the xy plane has 
coordinates (x, y, 0). The dot product of the perpendicular vector (0, 0, 1) and the in-plane 
vector (x, y, 0) is zero. 


Distance to a Plane 


How far from the origin is the plane x + 2y + 3z = 6? We know that the particular point 
(1, 1, 1) is on this plane. Its distance from (0, 0, 0) is the square root of 17+12+412, which 
is near 1.73. But (1, 1, 1) might not be the point on the plane that is closest to the origin. 
It isn’t. 

Geometry comes to the rescue, if you look at Figure 1.9. The vector vg happened to 
be chosen in the same direction as n. Other choices were available, because vp can be any 
point on the plane. But this vo is the closest point, and it is a multiple of n. The length of 
this vo is the distance to the plane. 


1G The shortest distance from (0, 0, 0) to the plane n - v = d is |d|/||n||. The vector v 
that gives this shortest distance is d/||n||* times n. It is perpendicular to the plane: 


_ dn-n 


z idn _ |d] 
In]? 


lal? liz 


(1.8) 


—d and the distance is |v || = 


Our example has d = 6 and ||n|| = v14. The distance 6/./14 is about 1.6, smaller than 
1.73. The closest point is v = fn. Notice that n » v = on e n which is equal to 6. So v 
reaches the plane. 


Note 1 If the normal vector is a unit vector (||m|| = 1), then the distance is |d|/1 which 
is |d]. To make this happen for the plane x + 2y + 3z = 6, divide the equation by //14. 
Then the normal vector has length 1, and the new right side is exactly the distance 6/14. 

If we multiply an equation n - v = d by 2, the plane doesn’t change. Its equation is 
now 2n - v = 2d. The distance from (0, 0, 0) is |2d]|/||2n||. We cancel the 2’s to see that 
the distance didn’t change. 


Note 2 Why is the shortest vg always in the direction of n? If v = (x, y, z) is any vector 
to the plane, its dot product with n is n -v = d. That is the equation of the plane! We 
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are trying to make the length ||v|| as small as possible. But the Cauchy-Schwarz inequality 
says that 


in -v| < lizali vl] or d| < |[n]| |v]. (1.9) 
Divide by ||n]||. The length ||v|| is at least |d|/||||. That distance is the shortest. 


Example 1.10 If v = (2, 1, 2) is the shortest vector from (0, 0, 0) to a plane, find the 
plane. 


Solution The shortest vector is always perpendicular to the plane. Therefore (2, 1, 2) is a 
normal vector n. The equation of the plane must be 2x + y + 2z = d. The point (2, 1, 2) 
also lies on the plane—it must satisfy the equation. This requires d = 9. So the plane is 
2x +y+2z=9. The distance to itis __«. 


A Plane in Four-Dimensional Space 


Sooner or later, this book has to move to n dimensions. Vectors will have n components. 
Planes will have “dimension” n — 1. This step might as well come now, starting with n = 4: 


1 


and n= are vectors in 4-dimensional space. 


— O O W 


The length of v is /32 + 02 + 02 + 12 which is v10. Similarly |||] equals v12. Most 
important, v - n is computed as before—it equals viny + v2n2 + v3n3 + v4n4. For these 
particular vectors the dot product is 3 + 0+ 0 — 3 = 0. Since v - n = 0, the vectors are 
perpendicular (in four-dimensional space). Then v lies on the plane through (0, 0, 0, 0) 
that is perpendicular to n. 

You see how linear algebra extends into four dimensions—the rules stay the same. 
That applies to vector addition (v + n has components 4, —1, —1, —2). It applies to 2v 
and —v and the linear combination 2v + 3n. The only difference in four dimensions is that 
there are four components. 

There is a big difference in the geometry: we can’t draw pictures. It is hard enough 
to represent 3-dimensional space on a flat page. Four or five dimensions are virtually 
impossible. But no problem to do the algebra—the equation is n -v = O orn -v = d or 
x-—y-—z—-—3t=d. 


Definition A plane contains all points v that satisfy one linear equation n - v = d. 


The plane goes through the origin if d = 0, because v = 0 satisfies the equation. To 
visualize a plane in high dimensions, think first of n. This vector sticks out perpendicular 
to a plane that is “one dimension lower.” 

By this definition, a plane in two-dimensional space is an ordinary straight line. (We 
suppose a plane in one-dimensional space is an ordinary point.) The equation x + 2y = 3 
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(3, 0,. ; 


Figure 1.10 Linear combinations produce a plane. 


gives a line perpendicular to n = (1, 2). A plane in five dimensions is four-dimensional 
and in some way this plane is flat. We can’t really see that (maybe Einstein could). When 
our vision fails we fall back on n + (v — vo) =Oorn-v=d. 


Linear Combinations Produce Planes 


So far planes have come from dot products. They will now come from linear combinations. 
As before, we start with a plane through (0,0,0). Instead of working with the normal 
vector n, we work with the vectors v that are in the plane. 


Example 1.11 Describe all points in the plane x — y — 3z = 0. 


Solution Two points in this plane are (1, 1,0) and (3,0,1). A third point is (0, 0, 0). 
To describe all the points, the short answer is by dot products. We require n - v = 0 or 
(1, —1, —3)- (x, y, z) = 0. The new and different answer is to describe every vector v that 
satisfies this equation. 


The vectors in the plane x — y — 3z = 0 are linear combinations of two “basic vectors”: 


1 3 y + 32 
v=y/1}]+z]0 or v= y ; (1.10) 
0 1 Z 


The first component of v is x = y + 3z. This is the original equation! The linear com- 
bination (1.10) displays all points in the plane. Any two numbers y and z are allowed in 
the combination, so the plane in Figure 1.10 is two-dimensional. The free choice does not 
extend to x, which is determined by y + 3z. 

Now move the plane away from (0, 0, 0). Suppose its equation is x — y — 3z = 2. 
That is n - v = d. It describes the plane by a dot product. The other way to describe the 
plane is to display every v that satisfies this equation. Here is a systematic way to display 
all points v on the plane. 
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Advice on vg Choose one particular point vo that has y = z = 0. The equation x — y — 
3z = 2 requires that x = 2. So the point is v9 = (2,0,0). In Chapter 2 this is called a 
“particular solution.” 


Advice on v Starting from vo, add to it all vectors that satisfy x — y — 3z = 0. The right 
side is now zero! This is the equation for the difference v — vp. When we find the solutions, 
we add them to vọ = (2, 0, 0). The original plane through (0, 0, 0) is just being shifted out 
parallel to itself, to go through the new point (2, 0, 0). 


We already solved x — y — 3z = 0. A first solution is (1, 1,0), when y = 1 and 
z = 0. For the second solution switch to y = 0 and z = 1. The equation gives x = 3. A 
second solution is (3, 0, 1). Now comes the linear algebra: Take all linear combinations of 
these two solutions. 


The points v — vo on the plane x — y — 3z = 0 (notice the zero) are the combinations 


1 3 
v—vp9=y}i1l]+z]0}. (1.11) 
0 1 


The points v on the plane x — y — 3z = 2 (notice the 2) move out by vg = (2, 0, 0): 


2 l 3 
v=/}O;}+y]1)]4+z/0]. (1.12) 
0 0 1 


This is the description by linear combinations. Go out to vo and then travel along 
the plane. The particular vector vo is multiplied by the fixed number one. The vector 3vo 
would not be on the plane. The two in-plane vectors are multiplied by any numbers y and z. 
All those linear combinations fill out the plane. 


Problem Set 1.3 
1 Find two points on the plane 3x + y — z = 6 and also find the normal vector n. Verify 
that the vector between the two points is perpendicular to n. 


2 The plane 4x — y — 2z = 1 is parallel to the plane and perpendicular to the 
vector 


3 True or false (give an example in either case): 


(a) Two planes perpendicular to the same vector n are parallel. 
(b) Two planes containing the same vector v are parallel. 


(c) Two planes that do not intersect are parallel. 
4 Find equations ax + by + cz = d for these three planes: 


(a) Through the point (2, 2, 1) and perpendicular to n = (1, 5, 2). 


26 
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(b) Through the point (1, 5, 2) and perpendicular to n = (1, 5, 2). 
(c) The “xz plane” containing all points (x, 0, z). See Example 1.3. 


If you reverse the sign of d, what does that do to the plane? Describe the position of 
x + y +z = —2 compared to x +y +z=2. 


Find the equation of the plane through (0, 0, 0) that contains all linear combinations 
of v = (1, 1, 0) and w = (1,0, 1). 


Find the equation ax + by = c of the line through (0, 0) perpendicular to n = (1, 4). 
What is the equation of the parallel line through (2, 3)? 


Choose three convenient points u, v, w on the plane x + y + z = 2. Their combina- 
tions cu + dv + ew lie on the same plane if c + d + e = . (Those are affine 
combinations.) 


Find the equation ax + by + cz + dt = e of the plane through (1, 1, 1, 1) in four- 
dimensional space. Choose the plane to be perpendicular to n = (1, 4, 1, 2). 


Find the equation of the plane through (0, 0, 0) that contains the vectors (4, 1, 0) and 
(—2, 0, 1). 


Questions 11-16 ask for all points v that satisfy a linear equation. Follow the advice 
on v and vo in the last paragraphs of this section. 
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Choose a particular point vp on the plane x — 3y — z = 6. Then choose two in-plane 
vectors that satisfy x — 3y — z = 0. Combine them with vo as in equation (1.12) to 
display all points on the plane. 


What is the particular solution vo to 2x + 4y — z = 0, when you follow the advice in 
the text? Write all vectors in this plane as combinations of two vectors. 


Find a very particular solution vp to x + y +z = 0. Then choose two nonzero 
solutions. Combine them to display all solutions. 


Choose a particular point on the line x — 3y = 9, and also a solution to x — 3y = 0. 
Combine them as in (1.12) to display all points on the line. 


Choose a particular solution vo (four components) to x + 2y + 3z + 4t = 24. Then 
find solutions to x + 2y + 3z + 4t = 0 of the form (x, 1, 0, 0) and (x, 0, 1, 0) and 
(x, 0, O, 1). Combine vo with these three solutions to display all points v on the plane 
in four dimensions. 


The advice on choosing vo will fail if the equation of the plane is 0x + 2y+3z = 12. 
When we choose y = z = 0 we can’t solve for x. Give advice on this case: Choose vo 
and then choose two solutions to 0x + 2y + 3z = 0 and take their combinations. 
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Questions 17-22 are about the distance to a plane. 


17 


18 


19 


20 


21 


22 


23 


24 
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The text found 6/v 14 as the distance from (0, 0, 0) to the plane x + 2y + 3z = 6. 
Find the distance to 2x + 4y + 6z = 12 from the formula |d|/||n||. Explain why this 
still equals 6/v 14. 


How far is the origin from the following planes? Find the nearest point v=dn_/||n||*: 
(a) 2x+2y+z=18 (b) x—3y—7z=0 (c) x—-z=6 
This question asks for the distance from (0, 0) to the line x — 2y = 25. 


(a) Why is the shortest vector in the direction of (1, —2)? 
(b) What point (t, —2t) in that direction lies on the line? 
(c) What is the distance from (0, 0) to the line? 


This follows Problem 19 for a general line ax + by = c. 


(a) The shortest vector from (0, 0) is in the direction of n = 
(b) The point tn lies on the line if t = 
(c) The shortest distance is |c|//a2 + b? because 


The shortest distance from (1, 0, 5) to the plane x + 2y — 2z = 27 is in the direction 
n = (1, 2, —2). The point (1, 0, 5) +1(1, 2, —2) lies on the plane when t = 
Therefore the shortest distance is ||tn|| = 


The shortest distance from a point w to the plane n - v = d is in the direction of 
. The point v = w + tn lies on the plane when t = . Therefore the 
shortest distance is ||tn|| = _ . When w = 0 this distance should be |d|/||n|]. 


The two planes x + 2y + 3z = 12 and x — y — z = 2 intersect in a line (in three 
dimensions). The two vectors are perpendicular to the line. The two points 
are on the line. 


The equation x + y + z — t = 1 represents a plane in 4-dimensional space. Find 


(a) its normal vector n 

(b) its distance to (0, 0, 0, 0) from the formula |d|/||1]| 

(c) the nearest point to (0, 0, 0, 0) 

(d) apoint v9 = (x, 0, 0, 0) on the plane 

(e) three vectors in the parallel plane x + y+z—t=0 

(f) all points that satisfy x + y + z — t = 1, following equation (1.12). 


For which n will the plane n - v = Q fail to intersect the plane x + y +z = 1? 


At what angle does the plane y + z = 0 meet the plane x + z = 0? 


28 1 Vectors and Matrices 


1.4 Matrices and Linear Equations 


The central problem of linear algebra is to solve linear equations. There are two ways to 
describe that problem—first by rows and then by columns. In this chapter we explain the 
problem, in the next chapter we solve it. 

Start with a system of three equations in three unknowns. Let the unknowns be 
x, y, Z, and let the linear equations be 


x+2y+3z=6 
2x+5y+2z=4 (1.13) 
6x —3y+ z=2. 


We look for numbers x, y, z that solve all three equations at once. Those numbers might 
or might not exist. For this system, they do exist. When the number of equations matches 
the number of unknowns, there is usually one solution. The immediate problem is how to 
visualize the three equations. There is a row picture and a column picture. 


R The row picture shows three planes meeting at a single point. 


The first plane comes from the first equation x +2y+3z = 6. That plane crosses the x, y, z 
axes at (6, 0, 0) and (0, 3, 0) and (0, 0, 2). The plane does not go through the origin—the 
right side of its equation is 6 and not zero. 

The second plane is given by the equation 2x + 5y + 2z = 4 from the second row. 
The numbers 0, 0, 2 satisfy both equations, so the point (0, 0, 2) lies on both planes. Those 
two planes intersect in a line L, which goes through (0, 0, 2). 

The third equation gives a third plane. It cuts the line L at a single point. That point 
lies on all three planes. This is the row picture of equation (1.13)—three planes meeting 
at a single point (x, y, z). The numbers x, y, z solve all three equations. 


line L is on 

~ both planes 
Ag line L meets 
third plane 
n at solution 


2x + 5y+2z=4 


Figure 1.11 Row picture of three equations: Three planes meet at a point. 
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col 1+ col 2+ col 3 


-i 


Figure 1.12 Column picture of three equations: Combinations of columns are b = 
2 x column 3 so (x, y, z) = (0, 0, 2); b = sum of columns so (x, y, z) = (1, 1, 1). 


C The column picture combines the columns on the left side to produce the right side. 


Write the three equations as one vector equation based on columns: 


1 2 3 6 
x}2/+y}] 5}4+z]2]/=] 4]. (1.14) 
6 23 1 2 


The row picture has planes given by dot products. The column picture has this linear 
combination. The unknown numbers x, y, z are the coefficients in the combination. We 
multiply the columns by the correct numbers x, y, z to give the column (6, 4, 2). 

For this particular equation we know the right combination (we made up the prob- 
lem). If x and y are zero, and z equals 2, then 2 times the third column agrees with the 
column on the right. The solution is x = 0, y = 0, z = 2. That point (0, 0, 2) lies on all 
three planes in the row picture. It solves all three equations. The row and column pictures 
show the same solution in different ways. 

For one moment, change to a new right side (6,9, 4). This vector equals column 
1+ column 2 + column 3. The solution with this new right side is (x, y,z) = ___. The 
numbers x, y, z multiply the columns to give b. 


The Matrix Form of the Equations 


We have three rows in the row picture and three columns in the column picture (plus the 
right side). The three rows and columns contain nine numbers. These nine numbers fill a 
3 by 3 matrix. We are coming to the matrix picture. The “coefficient matrix” has the rows 
and columns that have so far been kept separate: 


The coefficient matrix is A= | 2 5 2 
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The capital letter A stands for all nine coefficients (in this square array). The letter b 
will denote the column vector of right hand sides. The components of b are 6, 4, 2. The 
unknown is also a column vector, with components x, y, z. We call it x (boldface because 
it is a vector, x because it is unknown). By rows the equations were (1.13), by columns 
they were (1.14), and now by matrices they are (1.15). The shorthand is Ax = b: 


1 2 3 x 6 
Matrix notation: 2. & 2 y|/=|4 is Ax=b. (1.15) 
6 —3 1 Z 2 


We multiply the matrix A times the unknown vector x to get the right side b. 

Basic question: What does it mean to “multiply A times x”? We can do it by rows 
or we can do it by columns. Either way, Ax = b must be a correct representation of the 
three equations. So there are two ways to multiply A times x: 


Multiplication by rows Ax comes from dot products, a row at a time: 


(row 1)-x 
Ax = | (row2)-x |. (1.16) 
(row 3)-x 


Multiplication by columns Ax is a linear combination of the columns: 
Ax = x (column 1) + y (column 2) + z (column 3). (1.17) 


However you do it, all five of our equations (1.13)-(1.17) show A times x. 


Examples 
1 0 O|}7 4 4 1 O 0; 1/4 4 
Ax=/1 0 0||5|=]|4 Ax=/]0 1 OJ; |5]=]5 
1 0 O| 16 4 0 O 1} {6 6 


In the first example Ax is (4, 4, 4). If you are a row person, the dot product of every row 
with (4, 5, 6) is 4. If you are a column person, the linear combination is 4 times the first 
column (1, 1, 1). In that matrix the second and third columns are zero vectors. 

The second example deserves a careful look, because the matrix is special. It has 1’s 
on the “main diagonal.” Off that diagonal are 0’s. Whatever vector this matrix multiplies, 
that vector is not changed. This is like multiplication by 1, but for matrices and vectors. 
The exceptional matrix in this example is the 3 by 3 identity matrix, and it is denoted by T: 


1 0 0 
Z=]0 1 0 always yields the multiplication Iv = v. 

0 0 1 
Remark The two ways of looking at a matrix (rows or columns) are the keys to linear 
algebra. We can choose between them (when we compute), or we can hold onto both (when 
we think). The computing decision depends on how the matrix is stored. A vector machine 
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2(col 1) + 3(col 2)=b 
_ a N 
ee 
=) a N 
| | 3(col 2) N 


Figure 1.13 Row picture for 2 by 2 (intersecting lines). Column picture (linear combi- 
nation produces b). 


like the CRAY uses columns. Also FORTRAN is slightly more efficient with columns. 
But equations are usually solved by row operations—this is the heart of the next chapter. 


We now review the row and column and matrix pictures when there are two equations: 


3x -y=3 = 3 ellik 
x+y=5 1 lIjly} 5p 
Those give two lines in the xy plane. They also give two columns, to be combined into 


b = (3, 5). Figure 1.13 shows both pictures. 


R (rows) The first row gives the line 3x — y = 3. The second row gives the second line 
x + y = 5. Those lines cross at the solution x = 2, y = 3. 


C (columns) A combination of the columns gives the vector on the right side: 


lile lls] 


M (matrix) The two by two coefficient matrix times the solution vector (2, 3) equals b: 


ws a IEE] 


Multiply by dot products if you prefer rows. Combine columns if you like vectors. This 
example is a small review of the whole chapter. Dot products and linear combinations are 
the two routes to Ax. 


This book sees Ax as a combination of columns of A. Even if you compute by dot 
products, please hold on to multiplication by columns. 


Possible Breakdown We hate to end this chapter on a negative note, but the equation 
Ax = b might not have a solution. The two lines in the xy plane might be parallel. The 
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three planes in x yz space can fail to intersect in a single point. There are two types of 
failure, either no solution or too many solutions: 


(a) The two lines or three planes intersect at no points. 


(b) The lines or planes intersect at infinitely many points. 


These cases are “exceptional!” but very possible. If you write down equations at random, 
breakdown won’t happen (almost certainly). The probability of failure is zero. But the 
equations of applied mathematics are not random—exceptions that have zero probability 
happen every day. When we solve Ax = b in Chapter 2, we watch closely for what can go 
wrong. 


Matrix Notation The four entries in a 2 by 2 matrix are a11, a12 in the first row and a21, 
a22 in the second row. The first index gives the row number. Thus a;; or A(i, j) is an entry 
in row i. The second index j gives the column number (we see this in detail in Section 2.2). 
The entry a12 is in row 1, column 2: 


{a1 a2 JAG, D Ad, 2) 
ele id oe A= | 4G 1) pe 


For an m by n matrix, the row number i goes from 1 to m and j stops at n. 

MATLAB enters a matrix by rows, with a semicolon between: A = [3 —1; 1 1]. If it 
knows the columns a = [3 1]’ and c = [-1 1]’, then also A = [a c]. The capital letter A is 
different from the small letter a. 


Matrix times Vector In MATLAB this is A*v. The software might multiply by rows 
or by columns, who knows? (We think it is by columns.) In a language that works with 
entries A(I, J) and components V(J), the decision is ours. This short program takes the dot 
product with row Í in the inner loop, where it sums over J: 


DO 101 = 1,2 
DO 10J = 1,2 
10 Bl) = Bal) + A(J) * VO) 
Problem 23 has the loops in the opposite order. Then the inner loop (with J fixed) goes 
down the column. That code computes Av as a combination of the columns. 


This one line is to reveal that MATLAB solves Av = b by v = A\b. 


Problem Set 1.4 


Problems 1-8 are about the row and column pictures of Ax = b. 


1 With A = I (the identity matrix) draw the planes in the row picture. Three sides of 
a box meet at the solution: 


Ix + Oy + 0z = 2 1 O 
Ox+ly+0z=3 or 0 1 
0x +0y+1z=4 0 0 
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2 Draw the vectors in the column picture of Problem 1. Two times column 1 plus three 
times column 2 plus four times column 3 equals the right side b. 


3 If the equations in Problem 1 are multiplied by 1, 2, 3 they become 


Ix + Oy + 0z = 2 1 0O O|]] x 
Ox ++2y+0z=6 or 0 2 0 y|=|]| 6 
Ox + Oy + 3z = 12 0 0 3 Z 


Why is the row picture the same? Is the solution the same? The column picture is 
not the same—draw it. 


4 If equation 1 is added to equation 2, which of these are changed: the row picture, the 
column picture, the coefficient matrix, the solution? The new equations in Problem 1 
would be x = 2,x+y=5,z=4. 


5 Find any point on the line of intersection of the planes x + y+3z=6 and x — y+z=4. 
By trial and error find another point on the line. 


6 The first of these equations plus the second equals the third: 


x+ y+ z=2 
x+2y+ z=3 
2x + 3y+2z=5. 


The first two planes meet along a line. The third plane contains that line, because 
if x, y, z Satisfy the first two equations then they also . The equations have 
infinitely many solutions (the whole line). Find three solutions. 


7 Move the third plane in Problem 6 to a parallel plane 2x + 3y + 2z = 9. Now the 
three equations have no solution—why not? The first two planes meet along a line, 
but the third plane doesn’t cross that line (artists please draw). 


8 In Problem 6 the columns are (1, 1, 2) and (1, 2,3) and (1, 1, 2). This is a “singu- 
lar case” because the third column is . Find two combinations of the three 
columns that give the nght side (2, 3, 5). 


Problems 9-14 are about multiplying matrices and vectors. 


9 Compute each Ax by dot products of the rows with the column vector: 


@ |-2 3 ılļll2 (b) 
oe ola 012 1]/1 
00 1 2}{2 
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Compute each Ax in Problem 9 as a combination of the columns: 


1 2 4 
9(a) becomes Ax =2]—2/}4+2/3]/4+3/1]/= 
—4 1 2 


How many separate multiplications for Ax, when the matrix is “3 by 3”? 


Find the two components of Ax by rows or by columns: 


EE] = & JL] © [3 Jp 


Multiply A times x to find three components of Ax: 


0 0 1 x 2 Tr 3 l 2 1 j 
0 1 0 y and 1 2 3 1 and b 2 | d 
1 00 Z 3 3 6j|— 3 3 
(a) A matrix with m rows and n columns multiplies a vector with compo- 
nents to produce a vector with components. 
(b) The planes from the m equations Ax = b are in -dimensional space. 
The combination of the columns of A is in -dimensional space. 


(a) How would you define a linear equation in three unknowns x, y, z? 


(b) If vo = (xo, yo, Zo) and vı = (x1, yj, z1) both satisfy the linear equation, then 
so does cvo + dv, provided c + d = 


(c) All combinations of vp and vı satisfy the equation when the right side is 


Problems 15-22 ask for matrices that act in special ways on vectors. 


15 


16 


17 


18 


(a) What is the 2 by 2 identity matrix? J times [3 | equals [3 ]. 
(b) What is the 2 by 2 exchange matrix? P times | 3 | equals | |. 


(a) What 2 by 2 matrix R rotates every vector by 90°? R times [ }, | is [_? ]. 
(b) What 2 by 2 matrix rotates every vector by 180°? 


What 3 by 3 matrix P permutes the vector (x, y, z) to (y, z, x)? What matrix p= 
permutes (y, z, x) back to (x, y, z)? 


What 2 by 2 matrix E subtracts the first component from the second component? 
What 3 by 3 matrix does the same? 


3 3 
E = A and | a eo el ean 
7 7 


19 


20 


21 


22 


23 


24 


25 
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What 3 by 3 matrix E multiplies (x, y, z) to give (x, y,z + x)? What matrix E~! 
multiplies (x, y, z) to give (x, y,z — x)? If you multiply (3, 4,5) by E and then 
multiply by E —1 the two results are (_ ) and ( ). 


What 2 by 2 matrix Pı projects the vector (x, y) onto the x axis to produce (x, 0)? 
What matrix Pz projects onto the y axis to produce (0, y)? If you multiply (5, 7) 
by Pı and then multiply by P2, the two results are(___——+) and ((___). 


What 2 by 2 matrix R rotates every vector through 45°? The vector (1, 0) goes to 
(./2/2, /2/2). The vector (0, 1) goes to (9/22. 2/2), Those determine the 
matrix. Draw these particular vectors in the xy plane and find R. 


Write the dot product of (1, 4,5) and (x, y, z) as a matrix multiplication Ax. The 
matrix A has one row. The solutions to Ax = 0 lie ona . The columns of A 


are only in -dimensional space. 


Which code finds the dot product of V with row 1 and then row 2? Which code finds 
column 1 times V(1) plus column 2 times V(2)? Read the paragraph just before this 
problem set. If A has 4 rows and 3 columns, what changes are needed in the codes? 


DO 101 = 1,2 DO 10 J = 1,2 
DO 10 J = 1,2 DO 101 = 1,2 


10 Bil) = B() + A(1,J) * VO) 10 Bil) = Bil) + A(L,J) * VU) 


In both codes the first step is B(1) = A(1, 1) * V(1). Write the other three steps in 
the order executed by each code. 


In three lines of MATLAB, enter a matrix and a column vector and multiply them. 


Questions 26-28 are a review of the row and column pictures. 


26 


27 


28 


29 


Draw each picture in a plane for the equations x — 2y = 0,x + y = 6. 


For two linear equations in three unknowns x, y, z, the row picture will show (2 or 3) 
(lines or planes). Those lie in (2 or 3)-dimensional space. The column picture is in 
(2 or 3)-dimensional space. 


For four linear equations in two unknowns x and y, the row picture shows four 
. The column picture is in _ -dimensional space. The equations have no 
solution unless the vector on the right side is a combination of 


(Markov matrix) Start with the vector up = (1,0). Multiply again and again by the 
same matrix A. The next three vectors are u1, U2, U3: 


8 3tfl 8 
m= 5 pale uy = Au, = uz = Åu = ; 


What property do you notice for all four vectors uo, U1, U2, U3? 
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With a computer continue Problem 29 as far as the vector u7. Then from vo = (O, 1) 
compute vı = Avo and v2 = Avı on to v7. Do the same from wo = (.5, .5) as far 
as w7. What do you notice about u7, v7 and w7? Extra: Plot the results by hand or 
computer. Here is a MATLAB code—you can use other languages: 


u = [1; 0); A= [1.8 .3;..2 .7]; 
x = u; k = [0:1:7]; 
while length(x) <= 7 
u = A*u; x = [x u]; 
end 
plot(k, x) 


The u’s and v’s and w’s are approaching a steady state vector s. Guess that vector 
and check that As = s. This vector is “steady” because if you start with s, you stay 
with s. 


This MATLAB code allows you to input the starting column uo = [a; b; 1 — a — b] 
with a mouse click. These three components should be positive. (Click the left button 
on your point (a, b) or on several points. Add the instruction disp(u’) after the first 
end to see the steady state in the text window. Right button aborts.) The matrix A is 
entered by rows—its columns add to 1. 


A = [.8 .2 .1; .1.7 .3; .1.1 .6] 
axis([O0 1 0 1]); axis(’square’) 
plot(0, 0); hold on 
title(’Markov—your name’); xlabel(‘a’); ylabel(’b’); grid 
but = 1; 
while but == 

[a,b,but] = ginput(1) 

u = [a; b; 1-a-b]; 

x =u; k = [0:1:7]; 

while length(x) <= 7 

u = A*u; xX = [x ul; 


end 
plot(x(1,:), x(2,:), x(1,:), x (2,:), "0%; 
end 
hold off 
Invent a 3 by 3 magic matrix M3 with entries 1,2,...,9. All rows and columns 


and diagonals add to 15. The first row could be 8, 3, 4. What is M3 times (1, 1, 1)? 
What is M4 times (1, 1, 1, 1) if this magic matrix has entries 1, ..., 16? 


SOLVING LINEAR EQUATIONS 


2.1 The Idea of Elimination 


This chapter explains a systematic way to solve linear equations. The method is called 
“elimination”, and you can see it immediately in a 2 by 2 example. (2 by 2 means two 
equations and two unknowns.) Before elimination, x and y appear in both equations. After 
elimination, the first unknown x has disappeared from the second equation: 


x—2y=1 x-2y=1 
ter: 
3x +2y =11 gute! 8y =8. 


Before: 
Solving the second pair is much easier. The last equation 8y = 8 instantly gives y = 1. 
Substituting for y in the first equation leaves x — 2 = 1. Therefore x = 3 and the solution 
is complete. 

Elimination produces an upper triangular system—this is the goal. The nonzero 
coefficients 1, —2, 8 form a triangle. To solve this system, x and y are computed in reverse 
order (bottom to top). The last equation 8y = 8 yields the last unknown, and we go upward 
to x. This quick process is called back substitution. It is used for upper triangular systems 
of any size, after forward elimination is complete. 

Important point: The original equations have the same solution x = 3 and y = 1. 
The combination 3x + 2y equals 11 as it should. Figure 2.1 shows this original system as a 
pair of lines, intersecting at the solution point (3, 1). After elimination, the lines still meet 
at the same point. One line is horizontal because its equation 8y = 8 does not contain x. 

Also important: How did we get from the first pair of lines to the second pair? We 
subtracted 3 times the first equation from the second equation. The step that eliminates x 
is the fundamental operation in this chapter. We use it so often that we look at it closely: 


To eliminate x: Subtract a multiple of equation 1 from equation 2. 
Three times x — 2y = 1 gives 3x — 6y = 3. When this is subtracted from 3x + 2y = 11, 
the right side becomes 8. The main point is that 3x cancels 3x. What remains on the left 


side is 2y — (—6y) or 8y. Therefore 8y = 8, and x is eliminated. 


37 


38 2 Solving Linear Equations 


Before elimination After elimination 


y 


3x+2y=11 


x-2y=1 


Figure 2.1 Two lines meet at the solution. So does the new line 8y = 8. 


Ask yourself how that multiplier / = 3 was found. The first equation contained x. 
The first pivot was 1 (the coefficient of x). The second equation contained 3x, so the first 
equation was multiplied by 3. Then subtraction produced the zero. 

You will see the general rule if we change the first equation to 5x — 10y = 5. (Same 
straight line.) The first pivot is now 5. The multiplier is now / = z. To find the multiplier, 
divide by the pivot: 


; ; 3 = = = = 
Multiply equation ny z Ix —-10y=5 becca 5x — 10y =5 
Subtract from equation 2 3x +2y= 11 8y = 8. 


The system is triangular and the last equation still gives y = 1. Back substitution produces 
5x — 10 = 5 and 5x = 15 and x = 3. Multiplying the first equation by 5 changed the 
numbers but not the lines or the solution. Here is the rule to eliminate a coefficient below 
a pivot: 
Multiplier | = OS Nl to eliminate = 3 
pivot 5 

The new second equation contains the second pivot, which is 8. It is the coefficient of y. 
We would use it to eliminate y from the third equation if there were one. To solve n 
equations we want n pivots. 

You could have solved those equations for x and y without reading this book. It is 
an extremely humble problem, but we stay with it a little longer. Elimination might break 
down and we have to see how. 


Breakdown of Elimination 


Normally, elimination produces the pivots that take us to the solution. But failure is possi- 
ble. The method might go forward up to a certain point, and then ask us to divide by zero. 
We can’t do it. The process has to stop. There might be a way to adjust and continue—or 
failure may be unavoidable. Example 2.1 fails with no solution, Example 2.2 fails with too 
many solutions, and Example 2.3 succeeds after a temporary breakdown. 

If the equations have no solution, then elimination must certainly have trouble. 
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first |1 
column |3 


Columns don't combine to give plane 


second |—2 
column |—6 


Figure 2.2 Row picture and column picture when elimination fails. 


Example 2.1 Permanent failure with no solution. Elimination makes this clear: 


x—2y=] subtract 3 times x—2y=l 
3x—6y=11 eqn. 1 from eqn. 2 Oy = 8. 


The last equation is Oy = 8. There is no solution. Normally we divide the right side 8 by 
the second pivot, but this system has no second pivot. (Zero is never allowed as a pivot!) 
The row and column pictures of this 2 by 2 system show that failure was unavoidable. 


The row picture in Figure 2.2 shows parallel lines—which never meet. A solution 
must lie on both lines. Since there is no meeting point, the equations have no solution. 

The column picture shows the two columns in the same direction. All combinations 
of the columns lie in that direction. But the column from the nght side is in a different 
direction (1, 11). No combination of the columns can produce this right side—therefore 
no solution. 

With a different right side, failure shows as a whole line of solutions. Instead of no 
solution there are infinitely many: 


Example 2.2 Permanent failure with infinitely many solutions: 


x-2y=1 subtract 3 times x-2y=1 
3x —6y =3 eqn. 1 from eqn. 2 Oy = 0. 


Now the last equation is Oy = 0. Every y satisfies this equation. There is really only one 
equation, namely x — 2y = 1. The unknown y is “free”. After y is freely chosen, then x 
is determined as x = 1+ 2y. 


In the row picture, the parallel lines have become the same line. Every point on that 
line satisfies both equations. 
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right hand side P 


lies on the line of columns 


Same line from both equations 
Solutions all along this line 


1 fi 
a (second column) = Hl 
Figure 2.3 Row picture and column picture with infinitely many solutions. 


In the column picture, the right side now lines up with both columns from the left 
side. We can choose x = 1 and y = 0—the first column equals the right side. We can also 
choose x = 0 and y = —5; the second column times —5 equals the right side. There are 
infinitely many other solutions. Every (x, y) that solves the row problem also solves the 
column problem. 

There is another way elimination can go wrong—but this time it can be fixed. Sup- 
pose the first pivot position contains zero. We refuse to allow zero as a pivot. When the 
first equation has no term involving x, we can exchange it with an equation below. With 
an acceptable pivot the process goes forward: 


Example 2.3 Temporary failure, but a row exchange produces two pivots: 


Ox+2y=4 exchange 3x —2y=5 
3x —2y=5 equations 2y = 4. 


The new system is already triangular. This small example is ready for back substitution. 
The last equation gives y = 2, and then the first equation gives x = 3. The row picture is 
normal (two intersecting lines). The column picture is also normal (column vectors not in 
the same direction). The pivots 3 and 2 are normal—but a row exchange was required. 


Examples 2.1—2.2 are stngular—there is no second pivot. Example 2.3 is nonsin- 
gular—there is a full set of pivots. Singular equations have no solution or infinitely many 
solutions. Nonsingular equations have one solution. Pivots must be nonzero because we 
have to divide by them. 
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Three Equations in Three Unknowns 


To understand Gaussian elimination, you have to go beyond 2 by 2 systems. Three by three 
is enough to see the pattern. For now the matrices are square—an equal number of rows 
and columns. Here is a 3 by 3 system, specially constructed so that all steps lead to whole 
numbers and not fractions: 


2x +4y —2z=2 
4x + 9y —3z=8 (2.1) 
—2x — 3y + 7z = 10. 
What are the steps? The first pivot is 2 in the upper left corner. Below that pivot we want 


to create zeros. The first multiplier is the ratio / = 4/2 = 2. Subtraction removes the 4x 
from the second equation: 


1 Subtract 2 times equation 1 from equation 2. Then equation 2new is Ox + y +z = 4. 


We also eliminate x from equation 3—still using the first pivot. The quick way is to add 
equation 1 to equation 3. Then 2x cancels —2x. We do exactly that, but the rule in this 
book is to subtract rather than add. The systematic pattern has multiplier —2/2 = —1. 
Subtracting —1 times an equation is the same as adding that equation: 


2 Subtract —1 times equation 1 from equation 3. Equation 3ney is Ox + y + 5z = 12. 


Now look at the situation. The two new equations involve only y and z. We have reached 
a 2 by 2 system, with y + z = 4 and y + 5z = 12. The final step eliminates y to reach a 1 
by 1 problem: 


3 Subtract equation 2new from equation 3new. Then Ox + Oy + 4z = 8. 


The original 3 by 3 system has been converted into a triangular 3 by 3 system: 


2x +4y —2z=2 2x +4y —2z=2 
4x +9y —3z=8 has become y+z=4 (2.2) 
—2x —3y+7z= 10 4z = 8. 


The goal is achieved— forward elimination is complete. Notice the pivots 2, 1, 4 along the 
diagonal. Those pivots 1 and 4 were hidden in the original system! Elimination brought 
them out. 

The triangular system is zero below the pivots—three elimination steps produced 
three zeros. This triangle is ready for back substitution, which is quick: 


4z = 8 gives z=2, y+z=4 gives y=2, equation 1 gives x = —1. 


The row picture shows planes, starting with the first equation 2x + 4y — 2z = 2. The 
planes from all our equations go through the solution (—1, 2, 2). The original three planes 
are sloping, but the very last plane 4z = 8 is horizontal. 
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The column picture shows a linear combination of column vectors producing the 
right side. The coefficients in that combination are —1, 2, 2 (the solution): 


2 4 —2 2 
(—1)| 4]|+2] 9]+42] —3 | equals S|. (2.3) 
—2 —3 7 10 


The numbers x, y, z multiply columns 1, 2, 3 in the original system and also in the trian- 
gular system. 
For a 4 by 4 problem, or an n by n problem, elimination proceeds the same way: 


1 Use the first equation to create zeros below the first pivot. 
2 Use the new second equation to create zeros below the second pivot. 


3 Keep going to find the nth pivot. 


The result of forward elimination is an upper triangular system. It is nonsingular if there is 
a full set of n pivots (never zero!). Here is a final example to show the original system, the 
triangular system, and the solution from back substitution: 


x+ y+ z=6 x+y+z=6 y= 
x+2y+2z=9 y+z=3 — 
x+2y+3z= 10 Zd zek 


All multipliers are 1. All pivots are 1. All planes meet at the solution (3,2, 1). The 
columns combine with coefficients 3, 2, 1 to give the right side. 


Problem Set 2.1 


Probiems 1-10 are about elimination on 2 by 2 systems. 
1 What multiple / of equation 1 should be subtracted from equation 2? 
2x+3y=1 
10x + 9y = 11. 


After this elimination step, write down the upper triangular system and circle the two 
pivots. The numbers 1 and 11 have no influence on those pivots. 


2 Solve the triangular system of Problem 1 by back substitution, y before x. Verify 
that x times (2, 10) plus y times (3, 9) equals (1, 11). If the right side changes to 
(4, 44), what is the new solution? 


3 What multiple of equation 1 should be subtracted from equation 2? 
2x —4y=6 
—x+5y = 0. 


After this elimination step, solve the triangular system. If the nght side changes to 
(—6, 0), what is the new solution? 
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What multiple / of equation 1 should be subtracted from equation 2? 


ax +by = f 
cx + dy = g. 


The first pivot is a (assumed nonzero). Elimination produces what formula for the 
second pivot? The second pivot is missing when ad = bc. 


Choose a right side which gives no solution and another right side which gives in- 
finitely many solutions. What are two of those solutions? 


3x +2y=7 
6x + 4y = 


Choose a coefficient b that makes this system singular. Then choose a right side g 
that makes it solvable. 


2x + by = 13 
4x+8y = g. 


For which numbers a does elimination break down (1) permanently (2) temporarily? 


ax+3y = —3 
4x+6y= 6. 


Solve for x and y after fixing the second breakdown by a row exchange. 


For which three numbers k does elimination break down? In each case, is the number 
of solutions 0 or 1 or œ? 


kx+3y= 6 
3x + ky = —6. 


What is the test on b; and b2 to decide whether these two equations have a solution? 
How many solutions will they have? 


3x — 2y = bı 
6x — 4y = bo. 


In the xy plane, draw the lines x+y = 5 and x+2y = 6 and the equation y = __ 
that comes from elimination. The line 5x — 4y = c will go through the solution of 
these equations if c = 
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Problems 11-20 study elimination on 3 by 3 systems (and possible failure). 
11 Reduce this system to upper triangular form: 


2x+3y+z =1 
4x +7y+5z=7 
—2y+2z=6. 


Circle the pivots. Solve by back substitution for z, y, x. Two row operations are 
enough if a zero coefficient appears in which positions? 


12 Apply elimination and back substitution to solve 


2x —3y = 3 
4x—Sy+ z=7 
2x-— y—3z=5. 


Circle the pivots. List the three row operations which subtract a multiple of the pivot 
row from a lower row. 


13 Which number d will force a row exchange, and what is the triangular system for 


that d? 
2x+5y+z=0 
4x+dy+z=2 
y-z=3. 


Which number d makes this system singular (no third pivot)? 


14 Which number b leads later to a row exchange? Which b leads to a missing pivot? 
In that singular case find a nonzero solution x, y, z. 


x + by = 0 
x—2y —z=0 
y+z=0. 


15 (a) Construct a 3 by 3 system that needs two row exchanges to reach a triangular 
form and a solution. 


(b) Construct a 3 by 3 system that needs a row exchange to keep going, but breaks 
down later. 


16 Ifrows 1 and 2 are the same, how far can you get with elimination? If columns 1 
and 2 are the same, which pivot is missing? 
2x -y+z=0 2x+2y+z=0 
2x -y+z=0 4x+4y+z=0 
4x+y+z=2 6x + 6y+z=2. 


17 


18 


19 


20 
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Construct a 3 by 3 example that has 9 different coefficients on the left side, but rows 
2 and 3 become zero in elimination. 


Which number q makes this system singular and which right side t gives it infinitely 
many solutions? Find the solution that has z = 1. 


x+4y-—2z=1 
x+7Ty —-6z=6 
3y +qz =t. 


(Recommended) It is impossible for a system of linear equations to have exactly two 
solutions. Explain why. 

(a) If(x,y,z)and (X, Y, Z) are two solutions, what is another one? 

(b) If three planes meet at two points, where else do they meet? 

How can three planes fail to have an intersection point, when no two planes are 


parallel? Draw your best picture. Find a third equation that can’t be solved if x + 
y+z=Oandx —y—z=0. 


Problems 21-23 move up to 4 by 4 and n by n. 


21 


22 


23 


24 


Find the pivots and the solution for these four equations: 


2x+ y = 0 
x+2y+ z = 0 
y+2z+ t=0 
z+2t=5. 


This system has the same pivots and right side as Problem 21. How is the solution 
different (if it is)? 


2x4 ¥ = 0 
—x+2y— z = 
— y+2z— t=0 
= 2 LS, 


If you extend Problems 21-22 following the 1, 2, 1 pattern or the —1, 2, —1 pattern, 
what is the fifth pivot? What is the nth pivot? 


If elimination leads to these equations, what are all possible original matrices A? 
Row exchanges not allowed: 


x+y+z=0 


y+z=0 
57-0, 
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25 For which three numbers a will elimination fail to give three pivots? 
a 2 3 
A=ļja a 4 
aaa 


26 = Look for a matrix with row sums 4, 8 and column sums 2, s: 


a b at+b=4 a+c=2 
c d c+d=8 b+d=s 


The four equations are solvable only if s = . Then two solutions are in the 
matrices 


2.2 Elimination Using Matrices 


We now combine two ideas—elimination and matrices. The goal is to express all the steps 
of elimination (and the final result) in the clearest possible way. In a 3 by 3 example, 
elimination could be described in words. For larger systems, a long list of steps would be 
hopeless. You will see how to subtract a multiple of one row from another row—using 
matrices. 

The matrix form of a linear system is Ax = b. Here are b, x, and A: 


1 The vector of right sides is b. 


2 The vector of unknowns is x. (The unknowns change from x, y,Z,... to x1, X2, 
X3, ... because we run out of letters before we run out of numbers.) 


3 The coefficient matrix is A. 


The example in the previous section has the beautifully short form Ax = b: 


2x1 + 4x2 — 2x3 = 2 2 4 2 X] 2 
4x1 + 9x2 — 3x3 = is the same as 4 9 -3 x2 |=} 8]. (2.4) 
—2x1 — 3x2 + 7x3 = 10 -2 —3 74 L3 10 


The nine numbers on the left go into the matrix A. That matrix not only sits beside x, it 
multiplies x. The rule for “A times x” is exactly chosen to yield the three equations. 


Review of A times x A matrix times a vector gives a vector. The matrix is square when 
the number of equations (three) matches the number of unknowns (three). Our matrix is 3 
by 3. A general square matrix is n by n. Then the vector x is in n-dimensional space. This 
example is in 3-dimensional space: 


X] —1 
The unknown is x= | x and the solutionis x=} 2 
X3 2 
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Key point: Ax = b represents the row form and also the column form of the equations. 
We can multiply Ax a column at a time: 


2 4 29 2 
Ax =(-1)| 4]/+2] 9}/+2]/-3/=] 81]. (2.5) 
= =3 7 10 


This rule is used so often that we repeat it for emphasis. 


2A The product Ax is a combination of the columns of A. Columns are multiplied by 


components of x. Then Ax is x; times (column 1) + ---+ xn times (column n). 


One more point about matrix notation: The entry in row 1, column 1 (the top left 
comer) is called a11. The entry in row 1, column 3 is a13. The entry in row 3, column 1 
is a31. (Row number comes before column number.) The word “entry” for a matrix corre- 
sponds to the word “component” for a vector. General rule: The entry in row i, column j 
of the matrix A is aij. 


Example 2.4 This matrix has aj; = 2i + j. Then a1; = 3. Also ay2 = 4 and a) = 5. 
Here is Ax with numbers and letters: 


3 4||2|_ |3-2+4-] ajy an |) X14} | &uxı + a12x2 

5 6||1| |{5-24+6-1 any a22|| x2) | anxi tax |` 
For variety we multiplied a row at a time. The first component of Ax is 6+4 = 10. That is 
the dot product of the row (3, 4) with the column (2, 1). Using letters, it is the dot product 


of (a11, a12) with (x1, x2). The first component of Ax uses the first row of A. So the row 
number in aj; and a12 stays at 1. 


The ith component of Ax involves a;; and aj2 and... from row i. The short formula 
uses “sigma notation”: 


2B The ith component of Ax is aj1x1 + aj2x2 +--+ + ajnXn. This is I Aij Xj: 


The symbol Ÿ_ is an instruction to add. Start with j = 1 and stop with j = n. Start with 
aj,x, and stop with dinXn.* 


The Matrix Form of One Elimination Step 


Ax = b is a convenient form for the original equation (2.4). What about the elimination 
steps? The first step in this example subtracts 2 times the first equation from the second 
equation. On the right side, 2 times the first component is subtracted from the second: 


2 2 
b=| 8] changesto b =| 4 
10 10 


* Einstein shortened this even more by omitting the `. The repeated j in a;jx j automatically meant addition. 


He also wrote the sum as ai x;. Not being Einstein, we include the $. For summation Gauss used the notation [ ]. 
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We want to do that subtraction with a matrix! The same result is achieved when we multi- 
ply E times b: 


1 0 0 
The elimination matrix is E = |—2 1 0 
0 0 bl 


With numbers and letters, multiplication by E subtracts 2 times row 1 from row 2: 


1 oO 0Ọ0 2 2 1 0 0O by bi 
—2 1 0 Si/= | 4 —2 1 O bə | = | bo -—2b, 
0 0 1 10 10 0 0 1 b3 b3 


Notice how 2 and 10 stay the same. Those are bı and b3. The first and third rows of E are 
the first and third rows of the identity matrix 7. That matrix leaves all vectors unchanged. 
The new second component 4 is the number that appeared after the elimination step. 

It is easy to describe all the “elementary matrices” or “elimination matrices” like E. 
Start with the identity matrix 7 and change one of its zeros to —/: 


2C The identity matrix has 1’s on the diagonal and 0’s everywhere else. Then Zb = b. 
The elimination matrix that subtracts a multiple / of row j from row i has the extra nonzero 
entry —/ in the i, j position. 


Example 2.5 
1 0 0 1 0O QO 
I= ];0 1 0 and £3; =| 0 1 
0 0 1 —l QO 1 


If you multiply 7 times b, you get b again. If you multiply £3; times b, then / times the 
first component is subtracted from the third component. Here we get 9 — 4 = 5: 


1 00 1 1 1 0O O 1 1 
O 1 0 3]=1|3 and O 1 0 3} =] 3 
O 0 1 9 9 —4 0 1 9 5 


This is on the right side of Ax = b. What about the left side? The multiplier / = 4 was 
chosen to produce a zero, by subtracting 4 times the pivot. The purpose of E3; is to create 
a zero in the (3, 1) position. 

The notation fits the purpose. Start with A. Apply E’s to produce zeros below the 
pivots (the first E is E21). End with a triangular system. We now look in detail at the left 
side—elimination applied to Ax. 

First a small point. The vector x stays the same. The solution is not changed by 
elimination. (That may be more than a small point.) It is the coefficient matrix that is 
changed! When we start with Ax = b and multiply by E, the result is EAx = Eb. We 
want the new matrix E A—the result of multiplying E times A. 
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Matrix Multiplication 


The question is: How do we multiply two matrices? When the first matrix is E (an elimi- 
nation matrix E21), there is already an important clue. We know A, and we know what it 
becomes after the elimination step. To keep everything right, we hope and expect that 


1 0 0 2 4 -2 
E=|—-2 1 0 times A=| 4 9 -3 
0 0 1 —2 —3 7 

2 4 -2 

gives EA=| 0 1 1 

—2 —3 7 


This step does not change rows 1 and 3. Those rows are repeated in E A—only row 2 is dif- 
ferent. Twice the first row has been subtracted from the second row. Matrix multiplication 
agrees with elimination—and the new system of equations is E Ax = Eb. 

That is simple but it hides a subtle idea. Multiplying both sides of the original 
equation gives E(Ax) = Eb. With our proposed multiplication of matrices, this is also 
(EA)x = Eb. The first was E times Ax, the second is EA times x. They are the same! 
The parentheses are not needed. We just write E Ax = Eb. 

When multiplying ABC, you can do BC first or you can do AB first. This is the 
point of an “associative law” like 3 x (4 x 5) = (3 x 4) x 5. We multiply 3 times 20, or 
we multiply 12 times 5. Both answers are 60. That law seems so obvious that it is hard 
to imagine it could be false. But the “commutative law” 3 x 4 = 4 x 3 looks even more 
obvious. For matrices, EA is different from A E. 


2D Associative Law but not Commutative Law 
It is true that A(BC) = (AB)C. It is not usually true that AB equals BA. 


There is another requirement on matrix multiplication. We know how to multiply 
a matrix times a vector (A times x or E times b). The new matrix-matrix law should be 
consistent with the old matrix-vector law. When B has only one column (this column is b), 
the matrix-matnix product EB should agree with Eb. Even more, we should be able to 
multiply matrices a column at a time: 


If the matrix B contains several columns b1, b2, b3, 
then the columns of EB should be Eby, Eb2, Eb3. 


This holds true for the matrix multiplication above (where we have A instead of B). If you 
multiply column 1 by E, you get column 1 of the answer: 


1 0 O 2 2 
—2 1 0 4;=] 0 and similarly for columns 2 and 3. 
O 0 1j|-2 —2 


This requirement deals with columns, while elimination deals with rows. A third approach 
(in the next section) describes each individual entry of the product. The beauty of matrix 
multiplication is that all three approaches (rows, columns, whole matrices) come out right. 
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The Matrix P;; for a Row Exchange 


To subtract row j from row i we use E;;. To exchange or “permute” those rows we use 
another matrix P;;. Remember the situation when row exchanges are needed: Zero is in 
the pivot position. Lower down that column is a nonzero. By exchanging the two rows, we 
have a pivot (nonzero!) and elimination goes forward. 

What matrix P23 exchanges row 2 with row 3? We can find it by exchanging rows 
of I: 


1 0 0 
Po3=10 0 1 
0 1 0 


This is arow exchange matrix. Multiplying by P23 exchanges components 2 and 3 of any 
column vector. Therefore it exchanges rows 2 and 3 of any matrix: 


1 0 0 1 l 1 0 O}|/2 4 1 2A J 
0 0 1 oa) ea and 0 0 1 0 0 3/=]0 6 5 
0O 1 0 5 3 O 1 OJ 10 6 5 0 0 3 


On the right, P23 is doing what it was created for. With zero in the second pivot position 
and “6” below it, the exchange puts 6 into the pivot. 

Notice how matrices act. They don’t just sit there. We will soon meet other permu- 
tation matrices, which can put any number of rows into a different order. Rows 1, 2, 3 can 
be moved to 3, 1, 2. Our P23 is one particular permutation matrix—it works on two rows. 


2E Permutation Matrix P;; exchanges rows i and j when it multiplies a matrix. P;j is 
the identity matrix with rows i and j reversed. 
001 
To exchange equations 1 and 3 multiply by Pi3 = [o l o|; 
Usually no row exchanges are needed. The odds are good that elimination uses only 
the £;;. But the P;; are ready if needed, to put a new pivot into position. 


The Augmented Matrix A’ 


This book eventually goes far beyond elimination. Matrices have all kinds of practical 
applications—in which they are multiplied. Our best starting point was a square E times 
a square A, because we met this in elimination—and we know what answer to expect 
for EA. The next step is to allow a rectangular matrix A’. It still comes from our original 
equations, so we still have a check on the product E A’. 

Key idea: The equal signs and the letters x, y, z are not really involved in elimination. 
They don’t change and they don’t move. Elimination acts on A and B, and it does the same 
thing to both. We can include b as an extra column and follow it through elimination. 
The matrix A is enlarged or “augmented” by b: 


2 42 2 
Augmented matrix A'=|[A b]=| 4 9 -3 8 
—2 —3 7 10 
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A’ places the left side and right side next to each other. Elimination acts on whole rows 
of A’. The left side and right side are both multiplied by EF, to subtract 2 times equation 1 
from equation 2. With A’ those steps happen together: 


1 O 90 2 42 2 2 4 22) 2 
—2 | Q 4 9 -3 81=] 0 1 1 4/=EA’. 
0 O IL] L-2 -3 7 10 —2 —3 7 10 


The new second row contains 0, 1, 1, 4. The new second equation is x2 + x3 = 4. Both 
requirements on matrix multiplication are obeyed: 


R (by rows): Each row of E acts on A’ to give a row of EA’. 
C (by columns): E acts on each column of A’ to give a column of EA’. 


Notice again that word “acts.” This is essential. Matrices do something! The matrix A acts 
on x to give b. The matrix E operates on A to give EA. The whole process of elimination 
is a Sequence of row operations, alias matrix multiplications. A goes to E21 A which goes 
to E31 E21 A which goes to E32 E31 E21 A which is a triangular matrix. 

The right side is included when we work with A’. Then A’ goes to E21 A’ which goes 
to E31 E21 A’ which goes to E32 E31 E21 A’ which is a triangular system of equations. 


We stop for exercises on multiplication by E, before writing down the rules for all matrix 
multiplications. 


Problem Set 2.2 


Problems 1-14 are about elimination matrices. 
1 Write down the 3 by 3 matrices that produce these elimination steps: 


(a) 1 subtracts 5 times row 1 from row 2. 
(b) E32 subtracts —7 times row 2 from row 3. 


(c) Py2 exchanges rows 1 and 2. 


2 In Problem 1, applying E21 and then E32 to the column b = (1,0,0) gives 


E32 E21b = . Applying E32 before E21 gives E21 E32b = . Eo, E32 
is different from E32 E21 because when E32 comes first, row feels no effect 
from row 


3 Which three matrices E21, E31, E32 put A into triangular form U? 


1 1 0 
A= 4 6 1 and E32E3 E21A = U. 
—2 2 O 


Multiply those E’s to get one matrix M that does elimination: MA = U. 
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13 
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Include b = (1, 0, 0) as a fourth column in Problem 3 to produce A’. Carry out the 
elimination steps on this augmented matrix A’ to solve Ax = b. 


Suppose a 3 by 3 matrix has a33 = 7 and its third pivot is 2. If you change a33 to 11, 
the third pivot is . If you change a33 to , there is no third pivot. 


If every column of A is a multiple of (1,1, 1), then Ax is always a multiple of 
(1, 1, 1). Do a3 by 3 example. How many pivots are produced by elimination? 


Suppose £3; subtracts 7 times row 1 from row 3. To reverse that step you should 
7 times row to row . The matrix to do this reverse step (the 
inverse matrix) is R3; = 


Suppose £3; subtracts 7 times row 1 from row 3. What matrix R31 is changed into J? 
Then E31 R31 = J where Problem 7 has R31 £3; = 1. 


(a) En; subtracts row 1 from row 2 and then P23 exchanges rows 2 and 3. What 
matrix M = P 3 E>, does both steps at once? 


(b) P3 exchanges rows 2 and 3 and then E3; subtracts row 1 from row 3. What 
matrix M = E3; P23 does both steps at once? Explain why the M’s are the 
same but the E’s are different. 


Create a matrix that has a11 = a22 = a33 = 1 but elimination produces two negative 
pivots. (The first pivot is 1.) 


Multiply these matrices: 


; 4 a J 
—-1 1 O 
5 1J{0 0 104 


Explain these facts. If the third column of B is all zero, the third column of EB is 
all zero (for any E). If the third row of B is all zero, the third row of E B might not 
be zero. 


e UN 


3 
1 
0 


This 4 by 4 matrix will need elimination matrices E21 and E32 and E43. What are 
those matrices? 


2-1 0 0O 

-1 2-1 0 

ae 0 -1 2 —l 
0 0 -1 2 


Write down the 3 by 3 matrix that has aj; = 2i — 3j. This matrix has a32 = 0, 
but elimination still needs E32 to produce a zero. Which previous step destroys the 
original zero and what is E32? 
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Problems 15-22 are about creating and multiplying matrices. 


15 


16 


17 


18 


19 


20 


21 


22 


Write these ancient problems in a 2 by 2 matrix form Ax = b: 


(a) X is twice as old as Y and their ages add to 33. 
(b) The line y = mx +c goes through (x, y) = (2, 5) and (3, 7). Find m and c. 


The parabola y = a + bx + cx? goes through the points (x, y) = (1, 4) and (2, 8) 
and (3, 14). Find a matrix equation for the unknowns (a, b, c). Solve by elimination. 


Multiply these matrices in the orders EF and FE: 


= O © 
Y 
| 
oo = 
Oo å- O 
= O © 


1 0 

B= pa] 

b 0 

Also compute E? = EE and F? = FF. 


Multiply these row exchange matrices in the orders PQ and QP: 
O 1 O 0 0 1 
P=j]1 0 0 and Q=/0 1 0 
0 0 1 1 0 0 


Also compute P? = P P and(PQ)* = PQP Q. 

(a) Suppose all columns of B are the same. Then all columns of EB are the same, 
because each one is E times 

(b) Suppose all rows of B are the same. Show by example that all rows of EB are 
not the same. 


If E adds row 1 to row 2 and F adds row 2 to row 1, does EF equal FE? 


The entries of A and x are a;; and x;. The matrix E2; subtracts row 1 from row 2. 
Write a formula for 

(a) the third component of Ax 

(b) the (2, 1) entry of EA 

(c) the (2, 1) component of Ex 

(d) the first component of E Ax. 


The elimination matrix E = Ee i] subtracts 2 times row 1 of A from row 2 of A. 
The result is EA. In the opposite order A E, we are subtracting 2 times of A 


from . (Do an example.) 
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Problems 23-26 include the column b in the augmented matrix A’. 


23 


24 


25 


26 


27 


Apply elimination to the 2 by 3 augmented matrix A’. What is the triangular system 
Ux = c? What is the solution x? 


arm {i altel Ln] 


Apply elimination to the 3 by 4 augmented matrix A’. How do you know this system 
has no solution? 


1 2 3 x 1 
Ax=|2 3 4 ys], 2 
23 3 7 Z 6 
Change the last number 6 so that there is a solution. 


The equations Ax = b and Ax* = b* have the same matrix A. 


(a) What double augmented matrix A” should you use in elimination to solve both 
equations at once? 


(b) Solve both of these equations by working on a 2 by 4 matrix A”: 


e abd- e Le B-b] 


Choose the numbers a, b, c, d in this augmented matrix so that there is (a) no solution 
(b) infinitely many solutions. 


1 2 3a 
A =|0 4 5 b 
0 0d c 
Which of the numbers a, b, c, or d have no effect on the solvability? 


Challenge question: E;; is the 4 by 4 identity matrix with an extra 1 in the (i, j) 
position, i > j. Describe the matrix E;jEgı. When does it equal Ex E;;? Try 
examples first. 


2.3 Rules for Matrix Operations 


A matrix is a rectangular array of numbers or “entries.” When A has m rows and n columns, 
it is an “m by n” matrix. The upper left entry is a11, the upper right entry is an. (The lower 
left is amı and the lower right is amn.) Matrices can be added if their shapes are the same. 
They can be multiplied by any constant c. Here are examples of A + B and 2A, for 3 by 2 
matrices: 


OW se 
O AN 
+ 
OAN 
OAN 
| 
oN Ww 
O OO p 
5 
N 
D U= 
O AN 
I 
DO AON 
O œ e 
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x x x bij * * * n 
dil aj2 di5 b2j _|{[* x» (AB)j * * * 
X bs; * 
4 by 5 5 by 6 4 by 6 


Figure 2.4 Withi = 2 and j = 3, (AB)23 is (row 2) - (column 3). 


Matrices are added exactly as vectors are—one entry at a time. We could even regard 
vectors as special matrices with only one column (so n = 1). The matrix —A comes from 
multiplication by c = —1 (reversing all the signs). Adding A to —A leaves the zero matrix, 
with all entries zero. 

The 3 by 2 zero matrix is different from the 2 by 3 zero matrix. Even 0 has a shape 
(several shapes). All this is only common sense. 


The serious question is matrix multiplication. When can we multiply A times B, and what 
is the product AB? We cannot multiply A and B above (both 3 by 2). They don’t pass 
the following test, which makes the multiplication possible. The number of columns of A 
must equal the numbers of rows of B. If A has two columns, B must have two rows. 
When A is 3 by 2, the matrix B can be 2 by 1 (a vector) or 2 by 2 (square) or 2 by 20. 
When there are twenty columns in B, each column is ready to be multiplied by A. Then 
AB is 3 by 20. 

Suppose A is m by n and B is n by p. We can multiply. The product AB is m by p. 


m rows nrows |_| mrows 
n columns | | p columns | | p columns | ` 
The dot product is an extreme case. Then 1 by n multiplies n by 1. The result is 1 by 1. 
That single number is the dot product. 
In every case AB is filled with dot products. For the top corner, (row | of A) - 


(column 1 of B) gives the (1, 1) entry of AB. To multiply matrices, take all these dot 
products: (each row of A) - (each column of B). 


2F The entry in row į and column j of AB is (row i of A) - (column j of B). 


Figure 2.4 picks out the second row (i = 2) of a 4 by 5 matrix A. It picks out the third 
column (j = 3) of a5 by 6 matrix B. Their dot product goes into row 2 and column 3 
of AB. The matrix AB has as many rows as A (4 rows), and as many columns as B. 


Example 2.6 Square matrices can be multiplied if and only if they have the same size: 


1 1|]|2 2} 15 6 

2 -1//3 4] |1 OF 
The first dot product is 1 - 2+ 1-3 = 5. Three more dot products give 6, 1, and 0. Each 
dot product requires two multiplications—thus eight in all. 
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If A and B aren by n, so is AB. It contains n? dot products, each needing n mul- 
tiplications. The computation of AB uses n? separate multiplications. For n = 100 we 
multiply a million times. For n = 2 we have n> = 8. 

Mathematicians thought until recently that AB absolutely needed 2° = 8 multipli- 
cations. Then somebody found a way to do it with 7. By breaking n by n matrices into 
2 by 2 blocks, this idea also reduced the count for large matrices. Instead of n? it went 
below n?-:8, and the exponent keeps falling.* The best at this moment is n?-37°, But the 
algorithm is so awkward that scientific computing is done the regular way—with n? dot 
products and n multiplications each. 


Example 2.7 Suppose A is a row vector (1 by 3) and B is a column vector (3 by 1). Then 
AB is 1 by 1 (only one entry, the dot product). On the other hand B times A is a full 
matrix: 


0 0 0 0 
[1 2 3]/1]=[8] bt ede 23 
2 2.4.6 


A row times a column is an “inner” product—another name for dot product. A column 
times a row is an “outer” product. These are extreme cases of matrix multiplication, with 
very thin matrices. They follow the rule for shapes in multiplication: (n by 1) times (1 by n) 
is (n by n). 


Rows and Columns of AB 


In the big picture, A multiplies each column of B. When A multiplies a vector, we are 
combining the columns of A. Each column of AB is a combination of the columns of A. 
That is the column picture of matrix multiplication: 


Column of AB is (matrix A) times (column of B) = combination of columns of A. 


Look next at the row picture—which is reversed. Each row of A multiplies the whole 
matrix B. The result is a row of AB. It is a combination of the rows of B: 


1 2 3 
[ row i of A | 4 5 6] =[rowi of AB]. 
7 8 9 


We see row operations in elimination (E times A). We see columns in A times x. The 
“row-column picture” is the usual one—dot products of rows with columns. Believe it or 
not, there is also a “column-row picture.” Not everybody knows that columns multiply 
rows to give the same answer AB. This is in Example 2.8 below. 


*Maybe the exponent won’t stop falling before 2. No number in between looks special. 
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The Laws for Matrix Operations 


May we put on record six laws that matrices do obey, while emphasizing an equation 
they don’t obey? The matrices can be square or rectangular, and the laws involve three 
operations: 


1 Multiply A by a scalar to get cA. 
2 Add matrices (same size) to get A + B. 
3 Multiply matrices (right sizes) to get AB or BA. 


You know the right sizes for AB: (m by n) multiplies (n by p) to produce (m by p). 
The laws involving A + B are all simple and all obeyed. Here are three addition 
laws: 


A+B=B+A (commutative law) 
c(A+B)=cA+cB (distributive law) 
A+(B+C)=(A+8B)+C (associative law). 


Three more laws hold for multiplication, but AB = BA is not one of them: 


ABABA (the commutative “law” is usually broken) 
C(A+ B)=CA+CB (distributive law from the left) 
(A+ B)C =AC+ BC (distributive law from the right) 
A(BC) = (AB)C (associative law)(parentheses not needed). 


When A and B are not square, AB is a different size from BA. These matrices can’t be 
equal—even if both multiplications are allowed. For square matrices, almost any example 
shows that AB is different from BA: 


0 O1;0 1 0 0 O 11;0 0 1 0 
TEF o T i| o Ba =| Me "lo 5 
It is true that AJ = IZA. All square matrices commute with J and also with B = cI. Only 
these matrices cJ commute with all other matrices. 

The law A(B+C) = AB + AC is proved a column at a time. Start with A (b +c) = 
Ab + Ac for the first column. That is the key to everything—linearity. Say no more. 

The law A(BC) = (AB)C means that you can multiply BC first or AB first (Prob- 
lems 4 and 14). Look at the special case when A = B = C = square matrix. Then 
(A times A?) = (A? times A). The product in either order is A>. All the matrix powers A? 
commute with each other, and they follow the same rules as numbers: 


AP = AAA --- A (p factors) (AP)(A1) = AP™I (AP)1 = AP1, 


Those are the ordinary laws for exponents. A? times A4 is A’ (seven factors). A? to the 
fourth power is A! (twelve A’s). When p and q are zero or negative these rules still hold, 
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provided A has a “—1 power’—which is the inverse matrix A~!. Then A? = I is the 
identity matrix (no factors). 

For a number, a~! is 1/a. For a matrix, the inverse is written A~!. (It is never I/A; 
MATLAB would accept A\Z.) Every number has an inverse except a = 0. To decide when 
A has an inverse is a central problem in linear algebra. Section 2.4 will start on the answer. 
This section is a Bill of Rights for matrices, to say when A and B can be multiplied and 
how. 


Block Matrices and Block Multiplication 
We have to say one more thing about matrices. They can be broken into blocks (which are 


smaller matrices). This often happens naturally. Here is a 4 by 6 matrix broken into blocks 
of size 2 by 2—and each block is just Z: 


1 
0 
1 


0 |1 1 0 

1 | 0 0 1 

0} 1 1 0 

0 110 0 1 

If B is also 4 by 6 and its block sizes match the block sizes in A, you can add A+ Ba 
block at a time. 

We have seen block matrices before. The right side vector b was placed next to A in 

the “augmented matrix.” Then A’ =[A 5] has two column blocks. Multiplying A’ by an 

elimination matrix gave [EA Eb]. No problem to multiply blocks times blocks, when 


their shapes permit: 


2G Block multiplication If the cuts between columns of A match the cuts between rows 
of B, then block multiplication of AB is allowed: 


P F ae a “| 2.6) 
Az, A22 Ba >>> A21 By, + A2 Ba +++ | l 


This equation is the same as if the blocks were numbers (which are 1 by 1 blocks). We are 
careful to keep A’s in front of B’s, because BA can be different. The cuts between rows 
of A give cuts between rows of AB. Any column cuts in B are also column cuts in AB. 
Here are the block sizes for equation (2.6) to go through: 


be by nı mı “hin be | _ be by and 


m2 by nı m by n2 n2 by pi»: B m by pi -+> 


The column cuts in A must match the row cuts in B. Then the blocks will multiply. 


Main point When matrices split into blocks, it is often simpler to see how they act. The 
block matrix of /’s above is much clearer than the original 4 by 6 matrix. 
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Example 2.8 (Important) Let the blocks of A be its columns. Let the blocks of B be its 
rows. Then block multiplication AB is columns times rows: 


| Be anes 
AB = | a, Ay : = | aibi +-:--+anbn |. (2.7) 
| et ee a 


This is another way to multiply matrices! Compare it with the usual way, rows times 
columns. Row 1 of A times column 1 of B gave the (1, 1) entry in AB. Now column 1 
of A times row 1 of B gives a full matrix—not just a single number. Look at this example: 


PP -ie elo o 
=|; ae 4 (2.8) 


We stop there so you can see columns multiplying rows. If an m by 1 matrix (a column) 
multiplies a 1 by p matrix (a row), the result is m by p (a matrix). That is what we found. 
Dot products are “inner products,” these are “outer products.” 

When you add the two matrices at the end of equation (2.8), you get the correct 
answer AB. In the top left corner the answer is 3 + 4 = 7. This agrees with the row- 
column dot product of (1, 4) with (3, 1). 


Summary The usual way, rows times columns, gives four dot products (8 multiplications). 
The new way, columns times rows, gives two full matrices (8 multiplications). The eight 
multiplications are the same, just the order is different. 


Example 2.9 (Elimination by blocks) Suppose the first column of A contains 2, 6, 8. To 
change 6 and 8 to 0 and 0, multiply the pivot row by 3 and 4 and subtract. Each elimination 
step is really a multiplication by an elimination matrix E;;: 


1 0O O 1 O QO 
E21 = | —3 1 0 and E31 = 0 1 0 
0 O 1 —4 0 1 


The “block idea” is to do both eliminations with one matrix E. Multiply E21 and E31 to 
find that matrix Æ, which clears out the whole first column below the pivot: 


1 0 0 2 x x 
E=;-3 1 0 multiplies A to give EA=|0O x x 
—4 0 1 O x x 


Block multiplication gives a formula for those x’s in EA. The matrix A has four blocks a, 
b, c, D: a number, the rest of a row, the rest of a column, and the rest of the matrix. Watch 
how they multiply: 


1 0 al b a b 
eft Ettcke! e 
—c/a | I c iD 0 | D—-cb/a 
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The pivot is a = 2. The column under it is ¢ = Bae Elimination multiplies a by c/a and 
subtracts from c to get zeros in the first column. It multiplies the column vector c/a times 
the row vector b, and subtracts to get D — cb/a. This is ordinary elimination, written with 
vectors—a column at a time. 


` Problem Set 2.3 


Problems 1-17 are about the laws of matrix multiplication. 


1 Suppose A is 3 by 5, B is 5 by 3, C is 5 by 1, and D is 3 by 1. Which of these matrix 
operations are allowed, and what are the shapes of the results? 


(a) BA 

(b) A(B+C) 
(c) ABD 

(d AC+BD 
(ec) ABABD. 


2 What rows or columns and what matrices do you multiply to find 


(a) the third column of AB? 
(b) the first row of AB? 
(c) the entry in row 3, column 4 of AB? 


(d) the entry in row 1, column 1 of CDE? 


3 Compute AB + AC and separately A(B + C) and compare: 


1 5 0. 2 3 1 
a=; 4 and as 5 i and cel ai 


4 In Problem 3, multiply A times BC. Then multiply AB times C. 


5 Compute A? and A?. Make a prediction for A” and A”: 


1 b 2. 2 
a=[t 2] ant a=[? 2] 


6 Show that (A + B)? is different from A? + 2AB + B?, when 


1 2 1 0 
a= and s=; 


Write down the correct rule for (A + B)(A + B) = A? + AB + l 


10 


11 
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True or false. Give a specific example when false: 


(a) Ifcolumns 1 and 3 of B are the same, so are columns 1 and 3 of AB. 
(b) Ifrows 1 and 3 of B are the same, so are rows 1 and 3 of AB. 
(c) Ifrows 1 and 3 of A are the same, so are rows 1 and 3 of AB. 


(d) (AB)* = A7B?. 


How are the rows of DA and EA related to the rows of A, when 


3 0 0 1 
— = 9 
D f 4 and E i 5 


How are the columns of AD and AEF related to the columns of A? 


Row 1 of A is added to row 2. This gives EA below. Then column 1 of EA is added 
to column 2 to produce (FA)F: 


1 Ojja b a b 
ra=]; a “l= late T 


1 1 a a+b 
and EAF = (EA)| late sear | 


(a) Do those steps in the opposite order. First add column 1 of A to column 2 
by AF, then add row 1 of AF to row 2 by E(AF). 
(b) Compare with (EA) F. What law is or is not obeyed by matrix multiplication? 


Row 1 of A is added to row 2 to produce EA. Then adding row 2 of EA to row 1 
gives F(E A): 


1 1 a b 2a+c 2b+d 
EA)= = : 
ee) p Te, M pa tral 


(a) Do those steps in the opposite order: first row 2 to row 1 by FA, then row 1 
of FA to row 2. 


(b) What law is or is not obeyed by matrix multiplication? 

(3 by 3 matrices) Choose B so that for every matrix A (if possible) 
(a) BA=4A 

(b) BA=4B 


(c) BA has rows 1 and 3 of A reversed 
(d) Allrows of BA are the same. 
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12 


13 


14 


15 


16 


17 
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Suppose AB = BA and AC = CA for two particular matrices B and C: 


a b 1 O 0 1 
ial? 4 commutes with B=| d and Gal, T 


Prove that a = d and b = c = 0. Then A is a multiple of 7. Only the matrices 
A = aI commute with all other 2 by 2 matrices (like B and Ç). 


Which of the following matrices are guaranteed to equal (A — B)*: A? — B’, 
(B — A)*, A? — 2AB + B?, A(A— B) — B(A — B), A? — AB — BA + B?? 


True or false: 


(a) If A? is defined then A is square. 

(b) If AB and BA are defined then A and B are square. 

(c) If AB and BA are defined then AB and BA are square. 
(d) IfAB=BthnA=/J/. 


If A is m by n, how many separate multiplications are involved when 
(a) A multiplies a vector x with n components? 


(b) A multiplies an n by p matrix B? 


(c) A multiplies itself to produce A* ? Here m = n. 


To prove that (AB)C = A(BC), use the column vectors b),..., 5, of B. First 
suppose that C has only one column c with entries c1, ..., Cn: 

AB has columns Abı, ..., Ab, and Bc has one column c1b1 + --- + cydn. 

Then (AB)c = cj Ab, +---+cy,Ab, while A(Be) = A(c1ıb1 +--+ cCnbn). 


Linearity makes those last two equal: (A B)c = . The same is true for all 
other of C. Therefore (AB)C = A(BC). 


For A = [4 Z} ] and B = [10 4], compute these answers and nothing more: 


(a) column 2 of AB 
(b) row2o0f AB 

(c) row2of AA 

(d) row2o0f AAA 


Problems 18-20 use a;; for the entry in row i, column j of A. 


18 


Write down the 3 by 3 matrix A whose entries are 
(a) ayj=it+] 

(b) aij = (—1)**! 

(c) aj =i/j. 


19 


20 
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What words would you use to describe each of these classes of matrices? Give an 
example in each class. Which matrices belong to all four classes? 

(a) ay =Oiti £j 

(b) aj =Oifi < j 

(c) dij = Qj; 

(d) aij = a4;. 


The entries of A are a;;. Assuming that zeros don’t appear, what is 


(a) the first pivot? 
(b) the multiplier of row 1 to be subtracted from row i? 
(c) the new entry that replaces a;2 after that subtraction? 


(d) the second pivot? 


Problems 21-25 involve powers of A. 


21 


22 


23 


24 


25 


Compute A”, A’, A4 and also Av, A?v, A2v, Atv for 


and v= 


ooo = 
O ore © 
O -=-= o o 
~o N Y & 


Find all the powers A”, A®,... and AB, (AB)?,... for 
oO 2) 1 0 
oe E d an ere p af 
By trial and error find 2 by 2 matrices (of real numbers) such that 


(a) A*=—-I 
(b) BC = -CB (not allowing BC = 0). 


(a) Find a nonzero matrix A for which A? = 0. 


(b) Find a matrix that has A? Æ 0 but A? = 0. 


By experiment with n = 2 and n = 3 predict A” for 


2 4 1 1 a b 
a=| 9 ‘| and a=; l and =| T 
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Problems 26—34 use column-row multiplication and block multiplication. 


26 Multiply AB using columns times rows: 


1 0 1 
ap=|2 affi > pl=|21 3 o]+ = 
2: ij 2 


27 The product of upper triangular matrices is upper triangular: 


x xX 2 x x x 
AB=|0 x x Ox x}=]0 
0 0 x 0 0 x 0 0 
Row times column is dot product (Row 2 of A) - (column 1 of B) = 0. Which 


other dot products give zeros? 
Column times row is full matrix Draw x’s and 0’s in (column 2 of A) (row 2 of B) 
and in (column 3 of A) (row 3 of B). 


28 Draw the cuts in A (2 by 3) and B (3 by 4) and AB to show how each of the four 
multiplication rules is really a block multiplication: 


(1) Matrix A times columns of B. 
(2) Rows of A times matrix B. 
(3) Rows of A times columns of B. 


(4) Columns of A times rows of B. 
29 Draw the cuts in A and x to multiply Ax a column at atime: x; times column 1+---. 


30 Which matrices E2; and £3; produce zeros in the (2, 1) and (3, 1) positions of E21 A 


and £3;A? 
2 1 0 
A=|—2 0 l 
8 5 3 


Find the single matrix E = E31 E21 that produces both zeros at once. Multiply EA. 


31 Block multiplication says in Example 2.9 that 


1 Ojja b a b 
EA=| a l Aap ae | 


In Problem 30, what are c and D and what is D — cb/a? 


32 With i? = —1, the product of (A +iB) and (x +iy)is Ax+iBx+iAy— By. Use 
blocks to separate the real part without i from the imaginary part that multiplies 7: 


A —B x| _ | Ax—By| real part 
ik ane imaginary part 
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33 Each complex multiplication (a + ib)(c + id) seems to need ac, bd, ad, bc. But 
notice that ad+bc = (a+b)(c+d)—ac—bd; those 3 multiplications are enough. For 
matrices A, B, C, D this “3M method” becomes 3n? versus 4n>. Check additions: 
(a) How many additions for a dot product? For n by n matrix multiplication? 

(b) (Old method) How many additions for R = AC — BD and S = AD + BC? 


(c) (3M method) How many additions for R and S = (A+B)(C+D)—AC—BD? 
For n = 2 the multiplications are 24 versus 32. Additions are 32 versus 24. 
For n > 2 the 3M method clearly wins. 


34 Which cut is necessary in B? Which two cuts are optional in B? 
x x x x: & 
AB=|x x |x x x x |. (Keep A asis.) 
x x x x x 


35 Suppose you solve Ax = b for three special right sides b: 


0 
0 
1 


— 
D 
æ; 
[m 


1 
b=10 and 
0 


If the three solutions are the columns of a matrix X, what is A times X? 


36 If the three solutions in Question 35 are x; = (1,1,1) and x2 = (0,1, 1) and 
x3 = (0, 0, 1), what is the solution when b = (3, 5, 8)? Challenge problem: What is 
the matrix A? 


2.4 Inverse Matrices 


Suppose A is a square matrix. We look for a matrix A~! of the same size, such that A7! 
times A equals I. Whatever A does, A~! undoes. Their product is the identity matrix— 
which does nothing. But this “inverse matrix” might not exist. 


What a matrix mostly does is to multiply a vector x. That gives Ax. Multiplying Ax by 
AT! gives A~!(Ax), which is also (A~!A)x. We are back to Ix = x. The product A~!A 
is like multiplying by a number and then dividing by that number. An ordinary number 
has an inverse if it is not Zero—matrices are more complicated and more interesting. The 
matrix AT! is called “A inverse.” 


Not all matrices have inverses. This is the first question we ask about a square matrix—is 
it invertible? We don’t mean that our first calculation is immediately to find A~!. In most 
problems we never compute it! The inverse exists if and only if elimination produces n 
pivots. (This is proved later. We must allow row exchanges to help find those pivots.) 
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Elimination solves Ax = b without knowing or using A~!. But the idea of invertibility is 
fundamental. 


Definition The matrix A is invertible if there exists a matrix A~! such that 


A-'A=I and AA! =1. (2.10) 


Note 1 The matrix A cannot have two different inverses. Suppose BA = I and also 
AC = I. Then 


B(AC)=(BA)C_ gives BI=IC or B=C. 


This shows that a left-inverse B (multiplying from the left) and a right-inverse C (multi- 
plying from the right) must be the same matrix. 


Note 2 If A is invertible then the one and only solution to Ax = b is x = A~'D: 


Multiply Ax=b by A`! tofind x =A~'Ax = A7'b. 


Note 3 (Important) Suppose there is a nonzero vector x such that Ax = 0. Then A 
cannot have an inverse. No matrix can bring 0 back to x. When A is invertible, multiply 
Ax = 0 by A`! to find the one and only solution x = 0. 


Note 4 A 2 by 2 matrix is invertible if and only if ad — bc is not zero: 
a by” l d —b 
= 241 
F | ad — bc i d a 


This number ad — bc is the determinant of A. A matrix is invertible if its determinant is not 
zero (Chapter 5). The test for n pivots is usually decided before the determinant appears. 


Note 5 A diagonal matrix is invertible when none of its diagonal entries are zero: 


dı 1/dı 
If A= then A! = 
dn 1/dn 


Note 6 The 2 by 2 matrix A = [ } 3 ] is not invertible. It fails the test in Note 4, because 
ad — bc equals 2 — 2 = 0. It fails the test in Note 3, because Ax = 0 when x = (2, —1). 
It also fails to have two pivots. Elimination turns the second row of A into a zero row. 
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The Inverse of a Product 


For two nonzero numbers a and b, the sum a + b might or might not be invertible. The 
numbers a = 3 and b = —3 have inverses i and —. Their sum a + b = 0 has no inverse. 
But the product ab = —9 does have an inverse, which is i times — Z. 

For two matrices A and B, the situation is similar. It is hard to say much about the 
invertibility of A + B. But the product AB has an inverse, whenever the factors A and B 
are separately invertible. The important point is that A~! and B7! come in reverse order: 


2H If A and B are invertible then so is AB. The inverse of AB is 
(AB)! = BAT, (2.12) 
To see why the order is reversed, start with AB. Multiplying on the left by A~! leaves B. 
Multiplying by B~! leaves the identity matrix I: 
B-'A-'AB equals B-'IB=B'!B=1. 


Similarly AB times B~!A~! equals AZAT! = AA7! = J. This illustrates a basic rule of 
mathematics: Inverses come in reverse order. It is also common sense: If you put on socks 
and then shoes, the first to be taken off are the . The same idea applies to three or 
more matrices: 


(ABC)! =c7!B aT, (2.13) 


Example 2.10 If E is the elementary matrix that subtracts 5 times row 1 from row 2, then 
E`! adds 5 times row 1 to row 2: 


1 0 0 10 0 
E=|-5 1 OO] and E!=1/5 1 0 
0 0 1 00 1 


Multiply E E7! to get the identity matrix J. Also multiply E~!E to get J. We are adding 
and subtracting the same 5 times row 1. Whether we add and then subtract (this is EET!) 
or subtract first and then add (this is ET! E), we are back at the start. 


Note For square matrices, an inverse on one side is automatically an inverse on the other 
side. If AB = I then automatically BA = J. In that case B is A~!. This is very useful to 
know but we are not ready to prove it. 


Example 2.11 Suppose F subtracts 2 times row 2 from row 3, and F~! adds it back: 


1 0 100 
F=!0 1 #O| and FU!=!/0 1 0 
0 2 1 02 1 
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Again FFT! = J and also F~!F = J. Now multiply F by the matrix E in Example 2.10 
to find FE. Also multiply ET! times F`! to find (FE)~!. Notice the order of those 
inverses! 


l 0 100 
FE=|-5 1 0J isinvertedby E™!F!=|5 1 0|. (2.14) 
10 —2 1 02 1 


This is strange but correct. The product FE contains “10” but its inverse doesn’ t. You can 
check that FE times E~! F~! gives J. 

There must be a reason why 10 appears in FE but not in its inverse. E subtracts 5 
times row 1 from row 2. Then F subtracts 2 times the new row 2 (which contains —5 times 
row 1) from row 3. In this order FE, row 3 feels an effect from row I. The effect is to 
subtract —10 times row | from row 3—this accounts for the 10. 


In the order E7! F—', that effect does not happen. First F7! adds 2 times row 2 to row 3. 
After that, E~! adds 5 times row 1 to row 2. There is no 10. The example makes two 
points: 


1 Usually we cannot find A~! from a quick look at A. 


2 When elementary matrices come in their normal order GF E, we can find the product 
of inverses E~! F~!'G—! quickly. The multipliers fall into place below the diagonal. 


This special property of E~! F~!G—! will be useful in the next section. We will explain it 
again, more completely. In this section our job is A~!, and we expect some serious work 
to compute it. Here is a way to organize that computation. 


The Calculation of A~! by Gauss-Jordan Elimination 


It was hinted earlier that A~! might not be explicitly needed. The equation Ax = b is 
solved by x = A~!b. But it is not necessary or efficient to compute A~! and multiply it 
times b. Elimination goes directly to x. In fact elimination is also the way to calculate A~!, 
as we now show. 

The idea is to solve AAT! = I a column at a time. A multiplies the first column 
of AT! (call that x1) to give the first column of J (call that e1). This is our equation 
Axı = e;. Each column of A~! is multiplied by A to produce a column of I: 


AA*=Al[x1 x. x3]=[e1] e e&3]=1. (2.15) 
To invert a 3 by 3 matrix A, we have to solve three systems of equations: Ax; = e and 
Ax = e and Ax3 = e3. Then x1, x2, x3 are the columns of A7!. 


Note This already shows why computing A~! is expensive. We must solve n equations 
for its n columns. To solve Ax = b directly, we deal only with one column. 


In defense of A~!, we want to say that its cost is not n times the cost of one system 
Ax = b. Surprisingly, the cost for n columns is only multiplied by 3. This saving is 
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because the n equations Ax; = e; all involve the same matrix A. When we have many 
different right sides, elimination only has to be done once on A. Working with the right 
sides is relatively cheap. The complete AT! needs n? elimination steps, whereas solving 
for a single x needs n?/3. The next section calculates these costs. 


The Gauss-Jordan method computes A~! by solving all n equations together. Usually the 
“augmented matrix” has one extra column b, from the right side of the equations. Now 
we have three right sides e1, e2, e3 (when A is 3 by 3). They are the columns of J, so the 
augmented matrix is really just[ A J ]. Here is a worked-out example when A has 2’s on 
the main diagonal and —1’s next to the 2’s: 


2 -1 1 0 0 
[A e: e e]=|/-1 2-1 0 1 0 
0 -1 0 0 1 
2-1 0 1 0 0 
>| 0 3-1 å 1 0 (Frow 1 + row 2) 
0-1 2 0 0 1 
2-1 0 1 0 0 
>| 0 3-1 2 1 0 
0 0 $ 3 $ l ( row 2 + row 3) 


We are halfway. The matrix in the first three columns is now U (upper triangular). The 
pivots 2, 3, $ are on its diagonal. Gauss would finish by back substitution on each of the 
three equations. The contribution of Jordan is to continue with elimination, all the way 
to the “reduced echelon form” R. Rows are added to rows above them, to produce zeros 
above the pivots: 


2-1 0 1 0 Ọ 

—>]| 0 3 : 3 7 j (3 row 3 + row 2) 
0 0 3 3 5 1 
2 0 0 2 J l (2 row 2 + row 1) 
aan ae | 

ade ae ee 
0 0 3 3 53 1 


Now divide each row by its pivot. (That can be done earlier.) The new pivots are 1. The 
result is to produce Z in the first half of the matrix. (This reduced matrix is R = I because 
A is invertible. More later.) The columns of A~! are in the second half. 


(divide by 2) 1 0 0 å 4 | 
(divide by 3) 0 1 0 § 1 Ff=[7 x x x3]. 
(divide by 3) 0 0 1 } 4 3 


Starting from the 3 by 6 matrix [A J], we have reached [7 A~!). Here is the whole 
Gauss-Jordan process on one line: 


Multiply [A I] by A` toget [I A`]. 
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The elimination steps gradually multiply by A~!. They end up with the actual inverse 
matrix. Again, we probably don’t need A`! at all. But for small matrices, it can be 
worthwhile to know the inverse. We add three observations about this particular A7! 
because it is an important example: 


1 A is symmetric across its main diagonal. So is A7!. 


2 The product of the pivots is 2(3)(3) = 4. This number 4 is the determinant of A. 
Chapter 5 will show why 4 is also the denominator in A7!: 


Art 


1 3.2 1 
J 2 A 2a (2.16) 
r2 3 


3 The matrix A is tridiagonal (three nonzero diagonals). But AT! is a full matrix with 


no zeros. That is another reason we don’t often compute A7!. 


Example 2.12 Find A`! by Gauss-Jordan elimination with A = [4 3]. 
2 3 1 0 2 3 1 D 

G =l; 7 0 >l i 2 1 
se 0 Ta @ oes 

0 1-2 1 O 1-2 1j 


The last matrix shows A~!. The reduced echelon form of [A J ]is[Z A~!]. The code 
for X = inverse(A) has only three important lines! The teaching code uses ref where 
MATLAB itself would use rref (or go directly to inv): 


| = eye(n,n); 
R = ref([A I); 
X= RG:,n + 1:n +n) 


The last line discards columns 1 to n, in the left half of R. It picks out X = A`! from the 
right half. Of course A must be invertible, or the left half will not be Z. 


Singular versus Invertible 


We come back to the central question. Which matrices have inverses? At the start of this 
section we proposed the pivot test: The inverse exists exactly when A has a full set of n 
pivots. (Row exchanges are allowed!) Now we can prove that fact, by looking carefully at 
elimination: 


1 With n pivots, elimination solves all the equations Ax; = e;. The solutions x; go 
into A~!. Then AA~! = J and A”! is at least a right-inverse. 
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2 Gauss-Jordan elimination is really a long sequence of matrix multiplications: 


(DosR ost Pere BAST: 


The factor D~! divides by the pivots. The matrices E produce zeros, the matrices P 
exchange rows (details in Section 2.6). The product matrix in parentheses is evidently a 
left-inverse that gives A~'A = J. It equals the right-inverse by Note 1. With a full set of 
pivots A is invertible. 

The converse is also true. Jf A~! exists, then A must have a full set of n pivots. Again 
the argument has two steps: 


1 If A has a whole row of zeros, the inverse cannot exist. Every matrix product AB 
will have that zero row, and cannot be equal to /. 


2 Suppose a column has no pivot. At that point the matrix can look like 


d x x x 
0 d x x 
0 0 0 a 
0 0 O x 


In case d3 = 0, we have a row of zeros. Otherwise subtract a multiple of row 3 so that 
row 4 is all zero. We have reached a matrix with no inverse. Since each elimination step is 
invertible, the original A was not invertible. Without pivots A7} fails. 


21 Elimination gives a complete test for AT! to exist. There must be n pivots. 
Example 2.13 If L is lower triangular with 1’s on the diagonal, so is L7}. 


Use the Gauss-Jordan method to construct L~!. Start by subtracting multiples of pivot 
rows from rows below. Normally this gets us halfway to the inverse, but for L it gets us all 
the way: 


1 0 O 1 OO 
[L I]=ja 1 0 0 1 0 
b c 1 0 0 1 
1 0 0O 1 #0 Ọ 
>|0 1 0 —a 1 0 (a times row 1 from row 2) 
>|0 c 1 -b 0 1 (b times row 1 from row 3) 
1 0O QO l 0 0 
0 1 0 —a 1 Oj;=[7 L]. 
>!10 0 1 ac-—b -c 1 


When L goes to I by elimination, J goes to L~!. In other words, the product of elimination 
matrices E37 E31 E21 is L~!. All the pivots are 1’s (a full set) and nothing appears above 
the diagonal. 


72 


2 Solving Linear Equations 


Problem Set 2.4 


Find the inverses (directly or from the 2 by 2 formula) of A, B, C: 


0 3 2 0 3 4 
del 4 and B=) and Gel, “if 


For these “permutation matrices” find P~! by trial and error (with 1’s and 0’s): 
0 0 1 O 1 0 
P=|0 1 0 and P={0 0 1 
1 0 0 1 0 0 
Solve for the columns of A~! = [4 9]: 
10 20};a} |1 saa 10 20/;b} |0 
20 SO||c| {0 20 SO}}d}] (1J 
Show that È 2 | has no inverse by trying to solve for the columns (a, c) and (b, d): 


> lle al=[o 1] 


Find three 2 by 2 matrices (not A = I) that are their own inverses: A? = 1. 


(a) If Ais invertible and AB = AC, prove quickly that B = C. 
(b) If A = |! !], find two matrices B A C such that AB = AC. 


(Important) If the 3 by 3 matrix A has row 1 + row 2 = row 3, show that it is not 
invertible: 


(a) Explain why Ax = (1, 0, 0) cannot have a solution. 
(b) Which right sides (b1, b2, b3) might allow a solution to Ax = b? 


(c) What happens to row 3 in elimination? 


Suppose A is invertible and you exchange its first two rows. Is the new matrix B 
invertible and how would you find B~! from A7!? 


Find the inverses (in any legal way) of 


and B= 


Ano Oo O 
oh O O 
O Ouo 
oo ON 
oo A U 
O OUN 
SIN © © 
NNO © 


10 


11 


12 


13 


14 
15 


16 


17 


18 


19 


20 
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(a) Find invertible matrices A and B such that A + B is not invertible. 
(b) Find singular matrices A and B such that A + B is invertible. 


If the product C = AB is invertible (A and B are square), then A itself is invertible. 
Find a formula for A`! that involves C7! and B. 


If the product M = ABC of three square matrices is invertible, then B is invertible. 
(So are A and C.) Find a formula for B~! that involves M~! and A and C. 


If A is invertible and you add row 1 to row 2 to get B, how do you find B7! 
from A~!? 


The inverse of B = i All A | is : 


Prove that a matrix with a column of zeros cannot have an inverse. 


Multiply & ya times E p |. What is the inverse of each matrix if ad Æ bc? 


Cc 


(a) What single matrix E has the same effect as these three steps? Subtract row 1 
from row 2, subtract row 1 from row 3, then subtract row 2 from row 3. 


(b) What single matrix L has the same effect as these three steps? Add row 2 to 
row 3, add row 1 to row 3, then add row 1 to row 2. 


If the 3 by 3 matrix A has column 1 + column 2 = column 3, show that it is not 
invertible: 


(a) Find a nonzero solution to Ax = 0. 


(b) Does elimination keep column 1 + column 2 = column 3? Explain why there 
is no third pivot. 


If B is the inverse of AŻ, show that AB is the inverse of A. 


Find the numbers a and b that give the correct inverse: 


—1 


4 -1 -1 -1l a b b b 
=1 4 -1 =1 _|[b ab b 
al =f #1 |b bab 
=] -1 -1 4 b b ba 


There are sixteen 2 by 2 matrices whose entries are 1’s and 0’s. How many of them 
are invertible? 


Questions 21-27 are about the Gauss-Jordan method for calculating A~!. 


21 


Change J into AT! as you reduce A to J (by row operations): 


pa tE mw te rie[t 229 
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22 


23 


24 


25 


26 


27 


28 


29 
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Follow the 3 by 3 text example but with plus signs in A. Eliminate above and below 
the pivots to reduce [A J]toR=[JI A7']: 


= >O © 


2 1 0 1 0 

[A 7J=]1 2 10 1 

0 12 0 0 

Use Gauss-Jordan elimination with J next to A to solve AAT! = I: 
l a 1 0 O 
0 1 c X; X2 x3)/=/0 10 
0 0 0 O 1 


Find A`! (if it exists) by Gauss-Jordan elimination on [A Z]: 


2 1 1 2-1 —l1 
A=/]1 2 1 and A=|-1 2 —-!1 
Li ib) 2 —1 —-1l 2 


What three matrices E21 and E12 and D7! reduce A = | } 2] to the identity matrix? 
Multiply D~! E12 E21 to find A~!. 


Invert these matrices by the Gauss-Jordan method starting with [A Z]: 
1 0 0 1 1 1 
A=|2 1 3 and A=|1 2 2 
0 O 1 t-23 


Exchange rows and continue with Gauss-Jordan to find A7!: 


True or false (with a counterexample if false): 


(a) A4by4matrix with Zeros is not invertible. 

(b) Every matrix with 1’s down the main diagonal is invertible. 
(c) If A is invertible then A~ is invertible. 

(d) If A is invertible then A? is invertible. 


For which numbers c is this matrix not invertible, and why not? 
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30 Prove that this matrix is invertible if a #O anda Æ b: 


> 

I 
a a a 
a a ¢ 
8 cS 


31 This matrix has a remarkable inverse. Find A~! and guess the 5 by 5 inverse and 
multiply AA~! to confirm: 


i =t -f =] 
0 1-1 1 
Pe. 0o 11 
0 0 0 1 


32 Use the inverses in Question 31 to solve Ax = (1, 1, 1, 1) and Ax = (1, 1,1, 1, 1). 


33 The Hilbert matrices have a;; = 1/(i + j — 1). Ask MATLAB for the exact inverse 
invhilb(6). Then ask for inv(hilb(6)). How can these be different, when the computer 
never makes mistakes? 


34 Find the inverses (assuming they exist) of these block matrices: 
I 0 A 0 O I 
C I C D I D| 


2.5 Elimination = Factorization: A = LU 


Students often say that mathematics courses are too theoretical. Well, not this section. 
It is almost purely practical. The goal is to describe Gaussian elimination in the most 
useful way. Many key ideas of linear algebra, when you look at them closely, are really 
factorizations of a matrix. The original matrix A becomes the product of two or three 
special matrices. The first factorization—also the most important in practice—comes now 
from elimination. The factors are triangular matrices. 

Start with a 2 by 2 example. There is only one nonzero to eliminate from A (it is the 
number 6). Subtract 3 times row 1 from row 2. That step is F2; in the forward direction (a 
subtraction). The return step from U to A is ES. (an addition): 


sa [i JE JE Jee 
awf JE JG Je 


The second line is our factorization. Instead of A = E: U we write A = LU. Move now 
to larger matrices with many E’s. Then L will include all their inverses. 
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Each step from A to U is a multiplication by a simple matrix. The elimination 
matrices E;; produce zeros below the pivots. (£;; subtracts a multiple of row j from 
row i.) The row exchange matrices P;; move nonzeros up to the pivot position. (Pi; 
exchanges rows i and j. It comes in the next section.) A long sequence of E’s and P’s 
multiplies A to produce U. When we put each E~! and P7! on the other side of the 
equation, they multiply U and bring back A: 


AS sap he E U. (2.17) 


This is a factorization, but with too many factors. The matrices E -l and P7! are too 
simple. The good factorization has a single matrix L to account for all the separate fac- 
tors 1a Another matrix P accounts for all the row exchanges. Then A is built out of 
L, P,and U. 

To keep this clear, we want to start with the most frequent case—when A is invertible 
and P = I. No row exchanges are involved. If A is 3 by 3, we multiply it by E21 and 
then by £3; and £32. That produces zero in the (2, 1) and (3, 1) and (3, 2) positions—all 
below the diagonal. We end with U. Now move those E’s onto the other side, where their 
inverses multiply U: 


(E32E3,E21)A =U becomes A = (E5,' E3! Ey )U whichis A=LU. (2.18) 


The inverses go in opposite order, as they must. That product of three inverses is L. We 
have reached A = LU. Now we stop to understand it. 


First point Every matrix Ei; l is lower triangular with 1’s on the diagonal. Its off-diagonal 
entry is +/, to undo the subtraction in E;; with —/. These numbers multiply the pivot row j 
to change row i. Subtraction goes from A toward U, and addition brings back A. The main 
diagonals of E and E~! contain all 1’s. 


Second point The product of several E’s is still lower triangular. Equation (2.18) shows 
a lower triangular matrix (in parentheses) multiplying A to get U. It also shows a lower 
triangular matrix (the product of Ez’) multiplying U to bring back A. This product of 
inverses is L. 

There are two good reasons for working with the inverses. One reason is that we want 
to factor A, not U. By choosing the “inverse form” of equation (2.18), we get A = LU. 
The second reason is that we get something extra, almost more than we deserve. This is 
the third point, which shows that L is exactly right. 


Third point Each multiplier l;; goes directly into its i, j position—unchanged—in the 
product L. Usually a matrix multiplication will mix up and combine all the numbers. Here 
that doesn’t happen. The order is right for the inverse matrices, to keep the /’s separate. We 
checked that directly for the two matrices E~! and F~! in the last section. The explanation 
below is more general. 

Since each E~! has 1’s down its diagonal, the final good point is that L does too. 


2J (A = LU) Elimination without row exchanges factors A into LU. The upper triangu- 
lar U has the pivots on its diagonal. The lower triangular L has 1’s on its diagonal, with 
the multipliers /;; in position below the diagonal. 
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Example 2.14 For the matrix A with —1, 2, —1 on its diagonals, elimination subtracts -3 
times row 1 from row 2. Then h1 = —}. The last step subtracts -2 times row 2 from 
row 3, to reach U: 


2-1 0 1 0 O]f2 -1 0 
A=|/-1 2 -1]/=]-5 1 O|]]0 3 -1/=LU. 
0-1 2 0-2 1}/L0 0 | 


The (3, 1) multiplier is zero because the (3, 1) entry in A is zero. No operation needed. 


Example 2.15 Change the top left entry from 2 to 1. The multipliers all become —1. The 
pivots are all +1. That pattern continues when A is 4 by 4 orn by n: 


1-1 0 O 
-1 2 -1 0 

re 0 -1 2 -1 
0 0-1 2 
1 1 —1 0 0 
—] 1 -1 0 
0 — I — 
0 O-1 1 


These examples are showing something extra about L and U, which is very important in 
practice. Assume no row exchanges. When can we predict zeros in L and U? 


When rows of A start with zeros, so do the corresponding rows of L. 


When columns of A start with zeros, so do the corresponding columns of U. 


If a row below the pivot starts with zero, we don’t need an elimination step. The three 
zeros below the diagonal in A gave the same three zeros in L. The multipliers /;; were 
zero. That saves computer time. Similarly, zeros at the start of a column are not touched 
by elimination. They survive into the columns of U. But please realize: Zeros in the middle 
of a matrix are likely to be filled in, while elimination sweeps forward. 


We now explain why L has the multipliers /;; in position, with no mix-up. 


The key reason why A equals LU: Ask yourself about the rows that are subtracted from 
lower rows. Are they the original rows of A? No, elimination probably changed them. Are 
they rows of U? Yes, the pivot rows never get changed again. For the third row of U, we 
are subtracting multiples of rows 1 and 2 of U (not rows of A!): 


Row 3 of U = row 3 of A — 13; (row 1 of U) — /32 (row 2 of U). (2.19) 
Rewrite that equation: 


Row 3 of A = 13; (row 1 of U) + l32 (row 2 of U) + 1 (row 3 of U). (2.20) 
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The right side shows the row [/3; /32 1] multiplying the matrix U. This is row 3 of 
A = LU. All rows look like this, whatever the size of A. As long as there are no row 
exchanges, we have A = LU. 


Remark The LU factorization is “unsymmetric” in one respect. U has the pivots on its 
diagonal where L has 1’s. This is easy to change. Divide U by a diagonal matrix D that 
contains the pivots. Then U equals D times a matrix with 1’s on the diagonal: 


dı 1 uy2/dy u13/dı 
dz l u23/d2 


dn 1 


It is convenient (but confusing) to keep the same letter U for this new upper triangular 
matrix. It has 1’s on the diagonal (like L). Instead of the normal LU, the new form has 
LDU. Lower triangular times diagonal times upper triangular: 


The triangular factorization can be written A=LU or A= LDU. 


Whenever you see LDU, it is understood that U has 1’s on the diagonal. Each row is 
divided by its first nonzero entry—the pivot. Then L and U are treated evenly. Here is LU 
and then also LDU: 


PIE J-EIE I ex 


The pivots 2 and 5 went into D. Dividing the rows by 2 and 5 left the rows [1 4] and 
[O 1]in the new U. The multiplier 3 is still in L. 


One Square System = Two Triangular Systems 


We emphasized that L contains our memory of Gaussian elimination. It holds all the 
numbers that multiplied the pivot rows, before subtracting them from lower rows. When 
do we need this record and how do we use it? 

We need to remember L as soon as there is a right side b. The factors L and U were 
completely decided by the left side (the matrix A). Now work on the right side. Most 
computer programs for linear equations separate the problem Ax = b this way, into two 
different subroutines: 


1 Factor (from A find L and U) 
2 Solve (from L and U and b find x). 


Up to now, we worked on b while we were working on A. No problem with that— 
just augment the matrix by an extra column. The Gauss-Jordan method for A~! worked on 
n right sides at once. But most computer codes keep the two sides separate. The memory 
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of forward elimination is held in L and U, at no extra cost in storage. Then we process b 
whenever we want to. The User’s Guide to LINPACK remarks that “This situation is so 
common and the savings are so important that no provision has been made for solving a 
single system with just one subroutine.” 


How do we process b? First, apply the forward elimination steps (which are stored in L). 
Second, apply back substitution (using U). The first part changes b to a new right side c— 
we are really solving Le = b. Then back substitution solves Ux = c. The original system 
Ax = bis factored into two triangular systems: 


Solve Lc=b and then solve Ux=c. (2.22) 


To see that x is the correct solution, multiply Ux = c by L. Then LU equals A and Le 
equals b. We have Ax = b. 

To emphasize: There is nothing new about equation (2.22). This is exactly what 
we have done all along. Forward elimination changed b into c as it changed A into U, 
by working on both sides of the equation. We were really solving the triangular system 
Le = b by “forward substitution.” Whether done at the same time or later, this prepares 
the right side for back substitution. An example shows it all: 


Example 2.16 Forward elimination ends at Ux = c. Here c is [ 8]: 


2u+2v=8 ee 2u+2v=8 
4u +9v = 21 SU =). 


The multiplier is 2. It goes into L. We have just solved Le = b: 


1 0 8 8 
Le = b The lower triangular system p B gives ga 


Ux = c The upper triangular system F | | = H gives x = ae 


It is satisfying that L and U replace A with no extra storage space. The triangular matrices 
can take the n? storage locations that originally held A. The /’s fit in where U has zeros 
(below the diagonal). The whole discussion is only looking to see what elimination actually 
did. 


The Cost of Elimination 


A very practical question is cost—or computing time. Can we solve 1000 equations on a 
PC? What if n = 10,000? Large systems come up all the time in scientific computing, 
where a three-dimensional problem can easily lead to a million unknowns. We can let the 
calculation run overnight, but we can’t leave it for 100 years. 

It is quite easy to estimate the number of individual multiplications that go into Gaus- 
sian elimination. Concentrate on the left side (the expensive side, where A changes to U). 
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[ q / 

| | ga 
nbyn + | —1 by n—-1 | + = / \ y 

n—2 
| | J2 \ 
| | : n—1 
l 

L a a ere eee 

area n? area (n—1)2 volume T 


Figure 2.5 Elimination needs about in? multiplications and subtractions. Exact count 
(n? —n) +++» +(1? — 1) = 4n? — in. 


The first stage produces zeros below the first pivot—it clears out the first column. To 
change an entry in A requires one multiplication and one subtraction. We will count the 
whole first stage as n? multiplications and n? subtractions. It is actually less, n? — n, 
because row 1 does not change. Now column 1 is set. 

The next stage clears out the second column below the second pivot. The matrix is 
now of size n — 1. Estimate it by (n — 1)? multiplications and subtractions. The matrices 
are getting smaller as elimination goes forward. Since the matrix has n columns, the rough 
count to reach U isn* + (n — 1)? +---+22+4 1°. 

There is an exact formula for this sum. It happens to be in(n F x)(n + 1). For 
large n, the i and the 1 are not important. The number that matters is in, and it comes 
immediately from the square pyramid in Figure 2.5. The base shows the n? steps for 
stage 1. The height shows the n stages as the working matrix gets smaller. The volume is a 


good estimate for the total number of steps. The volume of an n by n by n pyramid is qn: 


Elimination on A requires about ln? multiplications and 1n° subtractions. 
3 Pp 3 


How long does it take to factor A into LU with matrices of order n = 100? We used 
the MATLAB command t = clock; lu(A); etime(clock, t). The elapsed time on a SUN 
Sparcstation 1 was one second. For n = 200 the elapsed time was eight seconds. (This 
follows the n°? rule! The time ż is multiplied by 8 when n is multiplied by 2.) The matrix A 
was random. Starting with the command for n = 10:10:100, A = rand(n, n); you could 
measure times for n = 10, 20,..., 100. 

According to the n? rule, matrices of order n = 1000 will take 10° seconds. Matrices 
of order 10,000 will take 10° seconds. This is very expensive but remember that these 
matrices are full. Most matrices in practice are sparse (many zero entries). In that case 
A = LU is very much faster. For tridiagonal matrices of order 10,000, solving Ax = b is 
a breeze. 

We also ran lu(A) with the Student Version of MATLAB on a 12 megahertz 386 PC. 
For the maximum size n = 32, the elapsed time was l second. That includes the time to 
format the answer. If you experiment with for n = 5:30, A = rand(n, n); you will see the 
fixed cost for small matrices. 
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These are 1993 times, and speeds are going up. The day we started wniting this 
was the day that IBM announced it would open a new laboratory, to design massively 
parallel computers including their RISC technology. After “fierce internal debate,” they 
decided to tie together thousands of processors—as the Connection Machine and the Intel 
Hypercube are doing now. Every numerical algorithm, even Gaussian elimination, has to 
be reorganized to run well on a parallel machine. 


What about the right side b? The first stage subtracts multiples of bı from the lower 
components b2,...,b,. This is n — 1 steps. The second stage takes only n — 2 steps, 
because bı is not involved. The last stage of forward elimination (down to a 2 by 2 system) 
takes one step on the right side and produces c. 

Now start back substitution. Computing x, uses one step (divide by the last pivot). 
The next unknown uses two steps. The first unknown x, requires n steps (n—1 substitutions 
of the other unknowns, then division by the first pivot). The total count on the right side— 


forward to the bottom and back to the top—is exactly n?: 


(n—1)+(n—2)4+---+141424---4¢(n-Dtn=n’. (2.23) 


To see that sum, pair off (n — 1) with 1 and (n — 2) with 2. The pairings finally leave n 
terms each equal to n. That makes n?. The right side costs a lot less than the left side! 


Each right side (from b to c to x) needs n? multiplications and n? subtractions. 


Here are the MATLAB codes slu(A) to factor A = LU and slv(A,b) to solve Ax = b. 
The program slu stops if zero appears in a pivot position (that fact is printed). These M- 
files are on the diskette that is available at no cost from MathWorks, to run with the Student 
Version of MATLAB. 


function [L, U] = slu(A) M-file: slu 
% SLU Square LU factorization with no row exchanges. 


[n,n] = size(A); 
tol = 1.e-6; 
fork = 1:n 
if abs(A(k,k)) < tol 
disp({’Small pivot in column ‘ int2str(k)]) 


end 
L(k,k) = 1; 
fori = t +1:n 
L(i,k) = AG, k)/A(k,k); 
en =k+1:n 
A(i,j) = A(i,j) — L(i,k) * A(k,j); 
end 


end 
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Problems 1-8 compute the factorization A = LU (and also A = LDU). 


1 
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forj = k:n 
U(k,j) = Atk,j); 
end 
end 


function x = slv(A,b) 
% Solve Ax = b using L and U from slu(A). No row exchanges! 


[L,U] = slu(A); 


% Forward elimination to solve Lc = b. 
% L is lower triangular with 1’s on the diagonal. 


[n,n] = size(A); 


fork = 1:n 
s = 0; 
forj = 1:k-1 
s =s + Lik,j) *c(j); 
end 
c(k) = b(k) - s; 
end 


% Back substitution to solve Ux = c. 
% U is upper triangular with nonzeros on the diagonal. 


fork = n:-1:1 

t = 0; 

forj =k+1:n 

=t + Utk,j) *x()); 

end 

x(k) = (c(k) — t)/U(k,k); 
end 
xe 


M-file: slv 


Problem Set 2.5 


What matrix E puts A into triangular form EA = U? Multiply by E~! = L to 


factor A into LU: 


2 i 
A=|0 4 
6 3 


MAN © 
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What two elimination matrices F2; and £32 put A into upper triangular form 
E32E21 A = U? Multiply by E3; and E>; to factor A into E3}! E3} U which is LU. 
Find L and U for 


1 1 1 
A=|2 4 5 
0 4 0 


What three elimination matrices E21, E31, E32 put A into upper triangular form 
E32 E31 E21 A = U? Multiply by E E and E: to factor A into E 2 Eo U 
which is LU. Find L and U: 


1 O 1 
A= | 2 2 
3 4 5 


Suppose A is already lower triangular with 1’s on the diagonal. Then U = I! 


1 0 0 
A=ļa 1 0O 
bc |1 


The elimination matrices E21, E31, E32 contain —a then —b then —c. 


(a) Multiply £32 £3; E21 to find the single matrix E that produces EA = 1. 
(b) Multiply B E Ey to find the single matrix L that gives A = LU (or LI). 


When zero appears in a pivot position, the factorization A = LU is not possible! (We 
are requiring nonzero pivots in U.) Show directly why these are both impossible: 


0 1 1 Olfd e oe j 
2 3| |z LO f RR 2 j 
1 2 1 m n I 


This difficulty is fixed by a row exchange. 


~. D 09 


Which number c leads to zero in the second pivot position? A row exchange is 
needed and A = LU is not possible. Which c produces zero in the third pivot 
position? Then a row exchange can’t help and elimination fails: 


l 
A= {2 
3 


va Ae o 


0 
I 
l 


What are L and D for this matrix A? What is U in A = LU and what is the new U 
in Á = LDU? 


> 
| 
O ON 
OWA 
1 © o 
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A and B are symmetric across the diagonal (because 4 = 4). Find their factorizations 
LDU and say how U is related to L: 


1 4 0 
ra | and B=|4 12 4 
0 4 O 


> 

II 
a a AQ A 
OS Ga Beng 
NV A SRRA 


aS oe & 


Find four conditions on a, b, c, d to get A = LU with four pivots. 


Find L and U for the nonsymmetric matrix 


XY A aA NSN 
aSa NS 


Soro xN 


Find the four conditions on a, b, c, d, r, s, t to get A = LU with four pivots. 


Problems 11-12 use L and U (without needing A) to solve Ax = b. 


11 


12 


13 


Solve the triangular system Le = b to find c. Then solve Ux = c to find x: 


1 0 2 4 2 
=|; d and v= 1 and b= |i] 


For safety find A = LU and solve Ax = b as usual. Circle c when you see it. 


Solve Le = b to find c. Then solve Ux = ¢ to find x. What was A? 


1 0 0 1 1 4 
B=] hl A O and U=/;0 1 1 and b=|5 
1 1 1 0 0 6 


(a) When you apply the usual elimination steps to L, what matrix do you reach? 


1 0 0 
L={h,; 1 0 
Iz) 132 1 


(b) When you apply the same steps to 7, what matrix do you get? 
(c) When you apply the same steps to LU, what matrix do you get? 


14 


15 


16 


17 


18 


19 
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If A = LDU and also A = L,D,U, with all factors invertible, then L = L] and 
D = D; and U = U,. “The factors are unique.” 


(a) Derive the equation L LD = DıU,U™!. Are the two sides lower or upper 
triangular or diagonal? 

(b) Show that the main diagonals in that equation give D = Dı. Why does L = 
Lı? 


Tridiagonal matrices have zero entries except on the main diagonal and the two 
adjacent diagonals. Factor these into A = LU and A= LDL!: 


1 1 O a a 0 
A=/1 2 1 and A=j/a a+b b 
0 1 2 0 b b+c 


When T is tridiagonal, its L and U factors have only two nonzero diagonals. How 
would you take advantage of the zeros in T in a computer code for Gaussian elimi- 
nation? Find L and U. 


Uw Ne O 
Aa UOO 


If A and B have nonzeros in the positions marked by x, which zeros (marked by 0) 
are still zero in their factors L and U? 


X Xxx x x x O 
on x x x O ad oe x x O x 
O x x x x Ox x 
0 Ox x O x x x 


After elimination has produced zeros below the first pivot, put x’s to show which 
blank entries are known in L and U: 


Suppose you eliminate upwards (almost unheard of). Use the last row to produce 
zeros in the last column (the pivot is 1). Then use the second row to produce zero 
above the second pivot. Find the factors in A = UL(!): 
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20 


21 


22 


23 


24 


25 


26 


27 


28 
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Collins uses elimination in both directions, forward (down) and backward (up), meet- 
ing at the center. Substitution goes out from the center. After eliminating both 2’s 
in A, one from above and one from below, what 4 by 4 matrix is left? Solve Ax = b 
his way. 


and b= 


O O N = 
O TES y = 
— (W = © 
m= VY O O 
N oO CO WN 


(Important) If A has pivots 2, 7, 6 with no row exchanges, what are the pivots for the 
upper left 2 by 2 submatrix B (without row 3 and column 3)? Explain why. 


Starting from a 3 by 3 matrix A with pivots 2, 7, 6, add a fourth row and column to 
produce M. What are the first three pivots for M, and why? What fourth row and 
column are sure to produce 9 as the fourth pivot? 


MATLAB knows the n by n matrix pascal(n). Find its LU factors for n = 4 and 5 
and describe their pattern. Use chol(pascal(n)) or slu(A) above or work by hand. 
The row exchanges in MATLAB’s lu code spoil the pattern, but Cholesky (= chol) 
doesn’t: 


1 1 1 1 
1 2 3 4 
A = pascal(4) = Co 6 GO 
1 4 10 20 


(Careful review) For which c is A = LU impossible—with three pivots? 
1 2 0 
Ac 3 e 
O 1 1 
Change the program slu(A) into sldu(A), so that it produces the three matrices L, D, 


and U. 


Rewrite slu(A) so that the factors L and U appear in the same n? storage locations 
that held the original A. The extra storage used for L is not required. 


Explain in words why x(k) is (c(k) — t)/U (k, k) at the end of slv(A, b). 
Write a program that multiplies triangular matrices, L times U. Don’t loop from 1 


to n when you know there are zeros in the matrices. Somehow L times U should 
undo the operations in slu. 
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2.6 Transposes and Permutations 


We need one more matrix, and fortunately it is much simpler than the inverse. It is the 
“transpose” of A, which is denoted by AT. Its columns are the rows of A. When A is an 
m by n matrix, the transpose is n by m: 


1 2 3 


| 1 0 
then A’ =]2 0 
0 0 4 a 


If a=] 


You can write the rows of A into the columns of AT. Or you can write the columns of A 
into the rows of AT. The matrix “flips over” its main diagonal. The entry in row i, column j 
of A! comes from row j, column i of the original A: 


(A")ij = Aji. 


The transpose of a lower triangular matrix is upper triangular. The transpose of A! is 


Note MATLAB’s symbol for the transpose is A’. To enter a column vector, type v = [1 2 
3 |’. To enter a matrix with second column w = [ 4 5 6 J’ you could define M = [ vw]. 
Quicker to enter by rows and then transpose the whole matrix: M = [1 2 3; 45 6]’. 


The rules for transposes are very direct. We can transpose A + B to get (A + B)!. 
Or we can transpose A and B separately, and then add A! + B!—same result. The serious 
questions are about the transpose of a product AB and an inverse A7!: 


The transpose of A+B is AT + BT. (2.24) 
The transposeof AB is (AB)! = BIAT. (2.25) 
The transpose of AT! is (A7!yF = AE (2.26) 


Notice especially how BTAT comes in reverse order like B7! A~!. The proof for the 
inverse is quick: B-!A7-! times AB produces 7. To see this reverse order for (AB)!, start 
with (Ax)!: 


T 


Ax combines the columns of A while xA! combines the rows of A’. 


It is the same combination of the same vectors! In A they are columns, in Al they are rows. 
So the transpose of the column Ax is the row x! A!. That fits our formula (Ax)! = x! A?. 
Now we can prove the formulas for (A B)" and (A7!)!. 

When B = [x,; x2] has two columns, apply the same idea to each column. The 
columns of AB are Ax, and Ax2. Their transposes are xi al and xi AT. Those are the 
rows of BTAT: 


iA! 


Transposing AB = | Axı Ax --- | gives xi A whichis BTAT. (2.27) 
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The right answer B! A! comes out a row at a time. Maybe numbers are the easiest: 


441 0||5 OF  J5 0 r,r |5 4||1 1|_]5 9 
ap =| aF TeL 4 a ee a aii Ia |: 
The rule extends to three or more factors: (ABC)! equals C TRIAT. 


Now apply this product rule to both sides of A~!A = I. On one side, J? is just Z. 
On the other side, we discover that (A~!)! is the inverse of AT: 


ATA =I leadsto A!(A7!)' =I. (2.28) 


Similarly AA! = I leads to (A~!)' A? = J. Notice especially: A! is invertible exactly 
when A is invertible. We can invert first or transpose first, it doesn’t matter. 
A and A! have the same pivots, when there are no row exchanges: 


If A=LDU then A’ =U'D'L!. The pivot matrix D = D! is the same. 


Example 2.17 The inverse of A = [1°] is A~! = [_} °]. The transpose is A’ = [ } $]. 
—1\T 1 —6 PESE: T] 
(A~")" equals > l whichis (A). 


Before leaving these rules, we call attention to dot products. The following statement 
looks extremely simple, but it actually contains the deep purpose for the transpose. For any 
vectors x and y, 


(Ax)! y equals xTaly equals xT (Aly). (2.29) 


We can put in parentheses or leave them out. In electrical networks, x is the vector of 
potentials and y is the vector of currents. Ax gives potential differences and A! y is the 
current into nodes. Every vector has a meaning—the dot product in equation (2.29) is the 
heat loss. 

In engineering and physics, x can be the displacement of a structure under a load. 
Then Ax is the stretching (the strain). When y is the internal stress, A! y is the external 
force. Equation (2.29) is a statement about dot products and also a balance equation: 


Internal work (strain - stress) = external work (displacement - force). 


In economics, x gives the amounts of n outputs. The m inputs to produce them 
are Ax. The costs per input go into y. Then the values per output are the components 
of Aly: 

Total input cost Ax - y equals total output value x- Ay. 


Problems 31-33 bring out applications. Here we only emphasize the transpose. When A 
moves from one side of a dot product to the other side, it becomes AT. 
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Symmetric Matrices 


For some matrices—these are the most important matrices—transposing A to A! produces 
no change. Then A! = A. The matrix is symmetric across the main diagonal. A symmetric 
matrix is necessarily square. Its (j, i) and (i, j) entries are equal. 


Definition A symmetric matrix has AT = A. This means that Ajj = Ajj. 


Example 2.18 A = E d =A! and D= i a =D 
A is symmetric because of the 2’s on opposite sides of the diagonal. In D those 2’s are 
zeros. A diagonal matrix is automatically symmetric. 

The inverse of a symmetric matrix is also symmetric. (We have to add: If A = A! 
has an inverse.) When A`! is transposed, equation (2.26) gives (AT)! which is AT! 
again. The inverses of A and D are certainly symmetric: 


5 —2 1 0 
-1 _ —/4-hT -1 _ ¢n-lyT 
A =|_5 =o )> and D =|( ee , 
Now we show how symmetric matrices come from nonsymmetric matrices and their trans- 
poses. 


Symmetric Products RTR and RRT and LDL* 


Choose any matrix R, probably rectangular. Multiply RT times R. Then the product R? R 
is automatically a symmetric matrix: 


The transpose of RTR is R'(R')! whichis R'R. (2.30) 


In words, the (i, j) entry of RTR is the dot product of column i of R with column j of R. 
The (j, i) entry is the same dot product. So RTR is symmetric. 

The matrix RR! is also symmetric. (The shapes of R and RT allow multiplication.) 
But RR! is a different matrix from RTR. In our experience, most scientific problems that 
start with a rectangular matrix R end up with RTR or RR! or both. 


Example 2.19 R=[1 2 2] and R™R=[344] and RRT=[9]. 


The product RTR is n by n. In the opposite order, R RT is m by m. Even if m = n, it is not 
likely that RTR = RRT. Equality can happen, but it is abnormal. 


When elimination is applied to a symmetric matrix, we hope that AT = A is an advantage. 
That depends on the smaller matrices staying symmetric as elimination proceeds—which 
they do. It is true that the upper triangular U cannot be symmetric. The symmetry is not 
in LU, it is in LDU. Remember how the diagonal matrix of pivots can be divided out 
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of U, to leave 1’s on the diagonal. 


1 2} |1 O 1 2 (LU misses the symmetry) 
27) [2 1 0 3 


_ {1 0 1 0 1 2 (LDU captures the symmetry 
ee 0 3 0 1 because U = LT). 


When A is symmetric, the usual form A = LDU becomes A = LDL". The final U (with 
1’s on the diagonal) is the transpose of L (also with 1’s on the diagonal). The diagonal D— 
the matrix of pivots—is symmetric by itself. 


2K If A = A! can be factored into LDU with no row exchanges, then U = L!. 
The symmetric factorization is A = LDLT. 


Notice that the transpose of LDL! is automatically (L')'D'L? which is LDL! 
again. We have a symmetric factorization for a symmetric matrix. The work of elimination 
is cut essentially in half, from n? /3 multiplications to n?/6. The storage is also cut essen- 
tially in half. We don’t have to store entries above the diagonal, or work on them, because 
they are known from entries below the diagonal. 


Permutation Matrices 


The transpose plays a special role for permutation matrices. These are matrices P witha 
single “1” in every row and every column. Then PT is also a permutation matrix—maybe 
the same or maybe different. Any product P; P is again a permutation matrix. We now 
create every P from the identity matrix, by reordering the rows. 

The simplest permutation matrix is P = I (no exchanges). The next simplest are the 
row exchanges P;;. Those are constructed by exchanging two rows i and j of I. Other 
permutations exchange three rows, ijk to jki or jik (those are different). By doing all 
possible row exchanges to J, we get all possible permutation matrices: 


Definition An n by n permutation matrix P has the rows of I (n rows) in any order. 


Example 2.20 There are six 3 by 3 permutation matrices. Here they are without the zeros: 


1 1 1 
P21; = | 1 P32 P21 = l 


aj 
| 
p< 


1 1 1 
P32 = l P21 P32 = | | 
1 1 1 


P31 


p< 


There are n! permutation matrices of order n. The symbol n! stands for “n factorial,” the 
product of the numbers (1)(2) - -- (n). Thus 3! = (1)(2)(3) which is 6. 
There are only two permutation matrices of order 2, namely F ‘a and k a 
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Important: The inverse of a permutation matrix is also a permutation matrix. In the 
example above, the four matrices on the left are their own inverses. The two matrices on 
the right are inverses of each other. In all cases, a single row exchange P21 or P;; is its own 
inverse. It exchanges the rows back again. But for a product like P32 P21, the inverses go 
in opposite order (of course). The inverse is P21 P32. 


More important: P~! is the same as PT. The four matrices on the left above are their own 

transposes. The two matrices on the nght are transposes—and inverses—of each other. 

You can check directly that P times PT equals J. The “1” in the first row of P hits the “1” 

in the first column of PT. It misses the ones in all the other columns. So P PT = 1. 
Another proof of PT = P7! is in the following reasoning: 


P is a product of row exchanges. A row exchange is its own transpose and its own inverse. 
PT is the product in the opposite order. So is P~!. Therefore PT = P7!. 
Symmetric matrices led to A = LDL". Now permutations lead to PA = LU. 


The LU Factorization with Row Exchanges 


We hope you remember A = LU. It started with A = (Ez z È: +++ )U. Every elimi- 
nation step was carried out by an E;; and inverted by an Ej; Those inverses were com- 
pressed into one matrix L. The lower triangular L has 1’s on the diagonal, and the result is 
A= LU. 

This is a great factorization, but it doesn’t always work—because of row exchanges. 
If exchanges are needed then A = (E7!.--P7!... Eas Pa aU, Every row ex- 
change is carried out by a P;j and inverted by that P;j. We now compress those row 
exchanges into a single permutation matrix. This gives a factorization for every invertible 
matrix A—which we naturally want. 

The main question is where to collect the P;;’s. There are two good possibilities—do 
them before elimination, or do them after the E;;’s. One way gives PA = LU, the other 
way has the permutation in the middle. 


1 The row exchanges can be moved to the left side, onto A. We think of them as 
done in advance. Their product P puts the rows of A in the right order, so that no 
exchanges are needed for PA. Then PA factors into LU. 


2 We can hold all row exchanges until after forward elimination. This leaves the pivot 
rows in a strange order. Pı puts them in the right order which produces U, (upper 
triangular as usual). Then A = Lı P,U}. 


PA = LU is used in almost all computing (and always in MATLAB). The form 
A = Lı PiU; is the right one for theoretical algebra—it is definitely more elegant. If we 
give space to both, it is because the difference is not well known. Probably you will not 
spend a long time on either one. Please don’t. The most important case by far has P = 1, 
when A equals LU with no exchanges. 
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The algebraist’s form A = Lı PU; Suppose a; = 0 but a2; is nonzero. Then row 2 
is the pivot row—the row exchange waits! Produce zeros below the pivot a2 in the usual 
way. Now choose the first available pivot in column 2. Row 1 is available (use it if a12 
is nonzero). Row 2 is not available (it was already a pivot row). By accepting the first 
available pivot, every step subtracts a pivot row from a lower row. 


O 1 1 O 1 1 0 1 1 
A=/1 2 1|} —-> ]1l1 2 1| —— fl 2 1). (2.31) 
279] Me |o3 7] ES ig Od 
Forward elimination ends there. The last matrix is not U, but by exchanging two rows it 
becomes U;. So that last matrix is PU. The two elimination steps are undone by L1. 
The whole factorization is 


1 l I 2 1 
A=]ļ|0Q0 1 1 0 1 1ļ|=LPılı. (2.32) 
3 2 1 l 0 0 4 
The computer’s form PA = LU To put the pivots on the diagonal, do the row ex- 
changes first. For this matrix, exchange rows 1 and 2 to put the first pivot in its usual place. 
Then go through elimination: 


i-ai 121 121 
pa=|0 1 11 — /o 11] — Jott 
ne ae S 03 7| 23 {190 4 


In this case P equals P; and U equals U; (not always). The matrix L is different: 
1 0 0 t 2.4 
PA=;}0 1 0O7;;0 1 1]|=ZU. (2.33) 
“wo 0 0 4 


When P comes in advance, the /’s are moved around. The numbers 2 and 3 are reversed. 
We could establish a general rule to connect P, L, U to Pı, L1, U; but we won’t. We feel 
safer when we do elimination on PA to find LU. 


2L (PA = LU) If Ais invertible, a permutation P will put its rows in the right order to 
factor PA into LU. There is a full set of pivots and Ax = b can be solved. 


In the MATLAB code, watch for these lines that exchange row k with row r below 
it (where the kth pivot has been found). The notation A(k,1:n) picks out row k and all 
columns 1 to n—to produce the whole row. The permutation P starts as the identity matrix, 
which is eye(n,n): 


A([r k],1:n) = A([k r], 1:n); 

L({r k],1:k-—1) = L([k r],1:k-1); 
P([r k],1:n) = P([k r],1:n); 

sign = —sign 
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The “‘sign’’ of P tells whether the number of row exchanges is even (sign = +1) or 
odd (sign = —1). At the start, P is Z and sign = +1. When there is a row exchange, the 
sign is reversed. It gives the “determinant of P” and it does not depend on the order of 
the row exchanges. 


Summary A = Lı PU; is more elegant, because L; is constructed as you go. The first 
available nonzero in each column becomes the pivot. It may not be on the main diagonal, 
but pivot rows are still subtracted from lower rows. At the end, P; reorders the rows to put 
the pivots on the diagonal of U1. 

For PA we get back to the familiar LU. This is the usual factorization. We should 
tell you: The computer does not always use the first available pivot. An algebraist accepts 
a small pivot—anything but zero. A computer looks down the column for the largest pivot. 
P may contain more row exchanges than are algebraically necessary, but we still have 
PA=LU. 

Our advice is to understand permutations but let MATLAB do the computing. Cal- 
culations of A = LU are enough to do by hand, without P. Here are the codes splu(A) to 
factor PA = LUz and splv(A,b) to solve Ax = b for any square invertible A. The program 
splu stops if no pivot can be found in column k. That fact is printed. 

These M-files are on the diskette of Teaching Codes available from MathWorks. 

The program plu in Chapter 3 will drop the s and allow the matrix to be rectangular. 
We still get PA = LU. But there may not be n pivots on the diagonal of U, and the 
matrix A is generally not invertible. 


function [P, L, U, sign] = splu(A) M-file: splu 
% SPLU Square PA = LU factorization with row exchanges. 


[m,n] = size(A); 
ifm ~=n 
error(’Matrix must be square.’) 
end 
P = eye(n,n); 
L = eye(n,n); 
U = zeros(n,n); 
tol = 1.e-6; 
sign = 1; 


fork = 1:n 
if abs(A(k,k)) < tol 
forr = k:n 

if abs(A(r,k)) >= tol 
break 

end 

ir==p 
disp(’A is singular’) 
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error([‘No pivot in column ’ int2str(k)]) 
end 
end 
A((r k],1:n) = A({k rJ,1:n); 
if k > 1, L({r k],1:k-1) = L([k r],1:k-1); end 
P(ir k],1:n) = P({k r],1:n); 


sign = —sign; 
end 
fori =k+1:n 
Lik) = Ali, k)/A(k,k); 
forj =k+1:n 
A(i,j) = A(i,j) — L(i,k) * ACk,j); 
end 
end 
forj = k:n 
U(k,j) = A(k,j) * (abs(A(k,j)) >= tol); 
end 
end 


if nargout < 4 

roworder = P *(1:m)’; 

disp(’Pivots in rows:’), disp(roworder’); end 
end 


function x = splv(A,b) 

% SPLV Solve Ax = b using P,L,U from splu(A). 

% Actually solve PAx = Pb. A is invertible! 

% The MATLAB backslash operator A\b also finds x. 


[P,L,U] = splu(A); 
[n,n] = size(A); 


b = Pxb; 
% Forward elimination to solve Lc = b. Really Lc = Pb. 


c = zeros(n,1); 
fork = 1:n 
s = 0; 
forj = 1:k-1 
s=s + L(k,j) *c(j); 
end 
c(k) = b(k) - s; 


end 


M-file: splv 


2.6 Transposes and Permutations 95 


% Back substitution to solve Ux = c. Then PAx = LUx = Pb. 


x = zeros(n,1); 


fork = n:-1:1 
t = 0; 
forj =k+1:n 
t=t + Ulk,j) *x(j); 
end 
x(k) = (c(k) — t/U(k,k); 
end 


Problem Set 2.6 


Questions 1-8 are about the rules for transpose matrices. 


1 Find A! and A`! and (A~!)! and (A!)7! for 
1 0 1 1 
hel | and also a=|; J 


2 Verify that (AB)! equals BTAT but does not equal ATBT: 


E) Gi] Ei] 


In case AB = BA (not generally true!) prove that BTAT = ATBT. 
3 The matrix ((AB)~!)! comes from (A~!)? and (B~!)!. In what order? 
4 Show that A? = 0 is possible but ATA = 0 is not possible (unless A = 0.) 


* Transparent proof that (AB)! = BTAT. Matrices can be transposed by looking 
through the page from the other side. Hold up to the light and practice with B below. 
Its column becomes a row in BT. To see better, draw the figure with heavy lines on 
thin paper and turn it over so the symbol BT is upright. 


The three matrices are in position for matrix multiplication: the row of A times the 
column of B gives the entry in AB. Looking from the reverse side, the row of B! 
times the column of A! gives the correct entry in B'AT = (AB)!. 
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Tg 


WW WW 


TA 


Ò 


5 (a) The row vector x! times A times the column y produces what number? 


0 
I' 2.3 
Tie = 
x Ay=[0 Jl : ] I| = l 
0 
(b) This is the row xTA = _ times the column y = (0, 1, 0). 


(c) This is the row xT = [0 _1] times the column Ay = 
6 The components of Ax are pe aijxj. Its dot product with y is 
m n 
Yo (Sains) 
i=l ‘j=1 
The components of AT y are >; 1 4 ¥i- Its dot product with x is 
n m 
»( ajy ) 
j=l ‘i=l 


The double sum can have j first or i first. Either way a;; multiplies x; and 
The key is (Ax)'( )=x!( ). 


7 When you transpose a block matrix M = | 4 8] the result is M" = . Test it. 
8 True or false: 


(a) The block matrix | 9 4 | is automatically symmetric. 


2.6 Transposes and Permutations 97 


(b) If A and B are symmetric then their product AB is symmetric. 
(c) If Ais not symmetric then AT! is not symmetric. 


(d) When A, B, C are symmetric, the transpose of (ABC)! is ABC. 


Questions 9-14 are about permutation matrices. 


9 


10 


11 


12 


13 


14 


15 


16 


If Pi and P2 are permutation matrices, so is Pı P2. After two permutations we still 
have the rows of J in some order. Give examples with Pı Po Æ PP, and P3P4 = 
P4 P3. 


There are 12 “even” permutations of (1, 2,3, 4), including (1,2,3,4) with no ex- 
changes and (4, 3, 2, 1) with two exchanges. What are the other ten with an even 
number of exchanges? Instead of writing a 4 by 4 matrix, the numbers 4, 3, 2, 1 give 
the position of the 1 in each row. 


Which permutation matrix makes PA upper triangular? Which permutations make 
P, A Pa lower triangular? 


0 0 6 
Asses pel 2 3 
0 4 5 


(a) Explain why the dot product of x and y equals the dot product of Px and P y. 
P is any permutation matrix. 


(b) With x = (1, 2,3) and y = (1, 1, 2) show that Px - y is not always equal to 
x. Py. 

(a) If you take powers of a permutation matrix, why is some power P* equal to 1? 

(b) Find a5 by 5 permutation matrix so that the smallest power to equal 7 is P®. 
(This is a challenge question. Combine a 2 by 2 with a 3 by 3.) 


Some permutation matrices are symmetric: PT = P. Then PTP = I becomes 
p? = I. 
(a) Finda 4 by 4 example with PT = P that is not just an exchange of two rows. 


(b) If P sends row 1 to row 4, then P! sends row to row . When 
PT = P the row exchanges come in pairs with no overlap. 


Find 2 by 2 symmetric matrices with these properties: 


(a) A is not invertible. 
(b) A is invertible but cannot be factored into LU. 
(c) A can be factored into LU but not into LLT. 


If A = A! and B = B', which of these matrices are certainly symmetric? 


(a) A2— B? 


98 


17 


18 


19 


20 
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(b) (A+ B)(A— B) 
(c) ABA 
(d) ABAB. 


(a) How many entries can be chosen independently, if A = A! is 5 by 5? 
(b) Howdo L and D (still 5 by 5) give the same number of choices? 


(c) How many entries can be chosen if A is skew-symmetric? This means that 
AT = —A. 


Suppose R is rectangular (m by n) and A is symmetric (m by m). 


(a) Prove that RTAR is symmetric. What shape is this matrix? 


(b) Prove that RTR has no negative numbers on its diagonal. 


Factor these symmetric matrices into A = LDLT. The pivot matrix D is diagonal: 


2 -l1 0 
a=|; a and a=l; 4 and A= |—l1 2 -1 
j 0-1 2 


After elimination clears out column 1 below the pivot, find the symmetric 2 by 2 
matrix that remains: 

2 4 8 l b c 

A=|4 3 9 and A=|b d e 

8 9 0 E e f 


Questions 21-29 are about the factorizations PA = LU and A = L,P,U}\. 


21 


22 


23 


Find the PA = LU factorizations (and check them) for 
0 1 1 1 2 0 
A=j]1 0 1 and A=/]2 4 1 
2. A 1 1 1 


Find a 3 by 3 permutation matrix (call it A) that needs two row exchanges to reach 
the end of elimination. What are its factors P, L, and U? 


Factor this matrix into PA = LU. Factor it also into A = L;P,U; (hold the row 
exchange until forward elimination is complete): 


O 1 
A=]ļ|0 3 
2 | 


— OO N 
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Write out P after each step of the MATLAB code splu, when 


0 O 1 
a=|) 4 and A=|/2 3 4 
0 5 6 


Write out P and L after each step of the code splu when 
0 1 2 
A=/1 1 0 
25 4 


Extend the MATLAB code splu to a code spldu which factors P A into LDU. 


What is the matrix Lı in A = L,P,\U,? 


oo = 
O U = 
NO- 


1 1 1 1 1 1 1 00 

A=|1 1 3|—>110 0 2|=PU=ļ|0 0 1 

2 5 8 03 6 0 1 0 

Suppose A is a permutation matrix. Then L, U, L4, Uj all equal 7. Explain why P 
is AT (in PA = LU) but P is A. 


Show that the second pivots are different in U and U; when 


A= 


= © © 


1 0 
ye | 
0 0 


(a) Choose E2; to remove the 3 below the first pivot. Then multiply E2, A E1, to 
remove both 3’s: 


1 3 0 1 0 O 
A=/3 11 4 is going toward D=1]0 2 0 
0 4 9 0 0 1 


(b) Choose E32 to remove the 4 below the second pivot. Then A is reduced to D 
by E32 E21 AEJ, E4, = D. Invert the E’s to find L in A = LDL!. 


The final questions are about applications of the identity (Ax)Ty = xT(ATy). 
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Wires go between Boston, Chicago, and Seattle. Those cities are at voltages xg, xc, 
and x5. With unit resistances between cities, the currents are in y: 


YBC 1 -l Oj} | xg 
y = Áx is yos |} =]0 1 -!1 XC 
YBS 1 0 -lj Lxs 
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2 Solving Linear Equations 


(a) Find the total currents AT y out of the three cities. 
(b) Verify that (Ax)! y agrees with x!(A! y)—six terms in both. 


Producing x; trucks and x2 planes needs x; + 50x2 tons of steel, 40x; + 1000x2 
pounds of rubber, and 2x; + 50x2 months of work. If the unit costs y1, y2, y3 are 
$700 per ton, $3 per pound, and $3000 per month, what are the values of a truck and 
a plane? Those are the components of AT y. 


Ax gives the amounts of steel, rubber, and work to produce x in Problem 32. Find A. 
Then Ax - y is the of inputs while x - Al y is the value of 


The matrix P that multiplies (x, y, z) to give (z, x, y) is also a rotation matrix. Find 
P and P?. The rotation axis a = (1, 1, 1) equals Pa. What is the angle of rotation 
from v = (2, 3, —5) to Pv = (—5, 2, 3)? 


Write A = [13] as the product EH of an elementary row operation matrix E and a 
symmetric matrix H. 


Suppose D is a diagonal matrix and U is upper triangular. Choose two matrices C 
that make H = C DU into a symmetric matrix. 


This chapter ends with a great new factorization A = EH. Start from A = LDU, 
with 1’s on the diagonals of L and U. Insert C~! and C to find E and H: 


A = (LC7!)(CDU) = (lower triangular E) (symmetric matrix H) 


with 1’s on the diagonal of E. What is C? 


VECTOR SPACES 
AND SUBSPACES 


3.1 Spaces of Vectors 


To a newcomer, matrix calculations involve a lot of numbers. To you, they involve vectors. 
The columns of Ax and AB are linear combinations of n vectors—the columns of A. This 
chapter moves from numbers and vectors to a third level of understanding (the highest 
level). Instead of individual columns, we look at spaces of vectors. Without seeing vector 
spaces and especially their subspaces, you haven’t understood everything about Ax = b. 

Since this chapter goes a little deeper, it may seem a little harder. That is natural. We 
are looking inside the calculations, to find the mathematics. The authors’ job is to make it 
clear. These pages go to the heart of linear algebra. 

We begin with the most important vector spaces. They are denoted by R!, R’, RÌ, 
R4, .... Each space R” consists of a whole collection of vectors. R? contains all column 
vectors with five components. This is called “5-dimensional space.” 


Definition The space R” consists of all column vectors with n components. 
The components are real numbers, which is the reason for the letter R. A vector whose n 
components are complex numbers lies in the space C”. 


The space R? is represented by the usual xy plane. The two components of the vector give 
the x and y coordinates of a point, and the vector goes out from (0, 0). Similarly the vectors 
in R? correspond to points in three-dimensional space. The one-dimensional space R! is a 
line (like the x axis). As before, we print vectors as a column between brackets, or along a 
line using commas and parentheses: 


4 : 
o | isinR®, (1,1,0,1, 1) isin RŽ, beet is in C2. 
l 


The great thing about linear algebra is that it deals easily with five-dimensional space. 
We don’t draw the vectors, we just need the five numbers (or n numbers). To multiply v 
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by 7, multiply every component by 7. Here 7 is a “scalar.” To add vectors in R5, add them 
a component at a time. The two essential vector operations go on inside the vector space: 


We can add two vectors in R”, and we can multiply any vector by any scalar. 


“Inside the vector space” means that the result stays in the same space. If v is the vector 
in R4 with components 1, 0, 0, 1, then 2v is the vector in R4 with components 2, 0, 0, 2. 
(In this case 2 is the “scalar.”) A whole series of properties can be verified in every R”. 
The commutative law is v + w = w + v, the distributive law is c(v + w) = cv + cw, and 
there is a unique “zero vector” satisfying 0+ v = v. Those are three of the eight conditions 
listed at the start of the problem set. These conditions are required of every vector space. 
The point of the eight conditions is that there are vector spaces other than R”, and they 
have to obey reasonable rules. 

A real vector space is a set of “vectors” together with rules for vector addition and 
for multiplication by real numbers. The addition and the multiplication must produce 
vectors that are in the space. And the eight conditions must be satisfied (which is usually 
no problem). Here are three vector spaces other than R”: 


M The vector space of all real 2 by 2 matrices. 
F The vector space of all real functions f(x). 
Z The vector space that consists only of a zero vector. 


In M the “vectors” are really matrices; in F the vectors are functions; in Z the only addition 
is 0+ 0 = 0. In each case we can add: matrices to matrices, functions to functions, zero 
vector to zero vector. We can multiply a matrix by 4 or a function by 4 or the zero vector 
by 4. The result is still in M or F or Z. The eight conditions (all easily checked) are 
discussed in the exercises. 

The space Z is zero-dimensional (by any reasonable definition of dimension). It is 
the smallest possible vector space. We hesitate to call it R°, which means no components— 
you might think there was no vector. The vector space Z contains exactly one vector (zero). 
No space can do without that zero vector. In fact each space has its own zero vector—the 
zero matrix, the zero function, the vector (0, 0, 0) in R?, 


Subspaces 


At different times, we will ask you to think of matrices and functions as vectors. But at all 
times, the vectors that we need most are ordinary column vectors. They are vectors with 
n components—but maybe not all of the vectors with n components. There are important 
vector spaces inside R”. 

Start with the usual three-dimensional space RÌ. Choose a plane through the origin 
(0, 0, 0). That plane is a vector space in its own right. If we add two vectors in the plane, 
their sum is in the plane. If we multiply an in-plane vector by 2 or —5, it is still in the 
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e 
° ) smallest vector space 


Figure 3.1 The “four-dimensional” matrix space M. The “zero-dimensional” space Z. 


plane. The plane is not R° (even if it is like R?). The vectors have three components and 
they belong to RÌ. The plane is a vector space inside RÌ. 

This illustrates one of the most fundamental ideas in linear algebra. The plane is a 
subspace of the full vector space R?. 


Definition A subspace of a vector space is a set of vectors (including 0) that satisfies two 
requirements: Ifv and w are vectors in the subspace and c is any scalar, then 
(i) v+ w isin the subspace and (ii) cv is inthe subspace. 


In other words, the set of vectors is “closed” under addition and multiplication. Those 
operations leave us in the subspace. They follow the rules of the host space, so the eight 
required conditions are automatic—they are already satisfied in the larger space. We just 
have to check the requirements (i) and (ii) for a subspace. 

First fact: Every subspace contains the zero vector. The plane in RÌ has to go 
through (0, 0,0). We mention this separately, for extra emphasis, but it follows directly 
from rule (ii). Choose c = 0, and the rule requires Ov to be in the subspace. Also —v is in 
the subspace. 

Planes that don’t contain the origin fail those tests. When v is on the plane, —v 
and Ov are not on the plane. A plane that misses the origin is not a subspace. 

Lines through the origin are also subspaces. When we multiply by 5, or add two 
vectors on the line, we stay on the line. The whole space is a subspace (of itself). Here is 
a list of all the possible subspaces of R3: 


(L) Any line through (0, 0, 0) (R?) the whole space 

(P) Any plane through (0, 0, 0) (Z) the single vector (0, 0, 0) 
If we try to keep only part of a plane or line, the requirements for a subspace don’t hold. 
Look at these examples. 


Example 3.1 Keep only the vectors (x, y) whose components are positive or zero (a 
quarter-plane). The vector (2, 3) is included but (—2, —3) is not. So rule (11) is violated 
when we try to multiply by c = —1. The quarter-plane is not a subspace. 


Example 3.2 Include also the vectors whose components are both negative. Now we have 
two quarter-planes. Requirement (ii) is satisfied; we can multiply by any c. But rule (i) 
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now fails. The sum of v = (2,3) and w = (—3, —2) is (—1, 1), and this vector is outside 
the quarter-planes. 


Rules (1) and (ii) involve vector addition v + w and multiplication by scalars like c 
and d. The rules can be combined into a single requirement—the rule for subspaces: 


A subspace containing v and w must contain all linear combinations cv + dw. 


Example 3.3 Inside the vector space M of all 2 by 2 matrices, here are two subspaces: 
: ; a b l . a 0 
(U) All upper triangular matrices 0 d (D) All diagonal matrices o dl: 


Add any two matrices in U, and the sum is inD U. The same is true for diagonal matrices 
in D. In this case D is also a subspace of U! Of course the zero matrix is in these subspaces, 
when a, b, and d all equal zero. 

To find a smaller subspace of diagonal matrices, we could require a = d. The 
matrices are multiples of the identity matrix J. The sum 27 + 3/ is in this subspace, and 
so is 3 times 47. It is a “line of matrices” inside M and U and D. 

Is the matrix J a subspace by itself? Certainly not. Only the zero matrix is. Your 
mind will invent more subspaces of 2 by 2 matrices—write them down for Problem 5. 


The Column Space of A 


We come to the most important subspaces, which are tied directly to a matrix A. We are 
trying to solve Ax = b. If A is not invertible, the system is solvable for some b and not 
solvable for other b. How can we describe the good right sides—the vectors that can be 
written as A times some vector x? 

Remember the key to Ax. It is a combination of the columns of A. To get every 
possible b, we use every possible x. So start with the columns of A, and take all their 
linear combinations. This produces the column space of A. It is a vector space made up 
of column vectors—not just the columns of A, but all combinations Ax of those columns. 

By taking all combinations, we fill out a vector space. 


Definition The column space of A consists of all linear combinations of the columns. 
The combinations are the vectors Ax. 


For the solvability of Ax = b, the question is whether b is a combination of the 
columns. The vector b on the right side has to be in the column space produced by A on 
the left side. 


3A The system Ax = bis solvable if and only if b is in the column space of A. 


When b is in the column space, it is a combination of the columns. The coefficients 
in that combination give us a solution x to the system Ax = b. 

Suppose A is an m by n matrix. Its columns have m components (not n). So the 
columns belong to R”. The column space of A is a subspace of R™. The set of all 
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Figure 3.2 The column space R(A) is a plane. Ax = b is solvable when b is on that 
plane. 


column combinations Ax satisfies rules (i) and (ii): When we add linear combinations or 
multiply by scalars, we still produce combinations of the columns. The word “subspace” 
is justified. Here is a 3 by 2 matrix, whose column space is a subspace of R. 


Example 3.4 
1 0 : 1 0 
Ax is 4 3 B whichis x114|+x213 
3 |" 2 3 


The column space consists of all combinations of the two columns—any x, times the first 
column plus any x2 times the second column. Those combinations fill up a plane in R? 
(Figure 3.2). If the right side b lies on that plane, then it is one of the combinations and 
(x1, x2) is a solution to Ax = b. The plane has zero thickness, so it is more likely that b is 
not in the column space. Then there is no solution to our 3 equations in 2 unknowns. 

Of course (0, 0, 0) is in the column space. The plane passes through the origin. There 
is certainly a solution to Ax = 0. That solution, always available, is x = ___ 


To repeat, the attainable right sides b are exactly the vectors in the column space. 
One possibility is the first column itself—take xı = 1 and x2 = 0. Another combination is 
the second column—take x; = 0 and x2 = 1. The new level of understanding is to see all 
combinations—the whole subspace is generated by those two columns. 


Notation The column space of A is denoted by R(A). This time R stands for “range”. 
We write R(A) and R(B) and R(C) for the column spaces of A and B and C. Start with 
the columns and take all their linear combinations. 


Example 3.5 Describe the column spaces for 


1 0 1 2 1 2 3 
dsl "| and B=|) A and c=| 4 0 oe 
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Solution The column space of A = J is the whole space R?. Every vector is a combina- 
tion of the columns of 7. In vector space language, R(/) is R?. 


The column space of B is only a line. The second column (2, 4) is a multiple of the first 
column (1, 2). Those vectors are different, but our eye is on vector spaces. The column 
space contains (1, 2) and (2, 4) and all other vectors (c, 2c) along that line. The equation 
Ax = bis only solvable when b is on the line. : 


The third matrix (with three columns) places no restriction on b. The column space R(C) is 
all of R*. Every b is attainable. The vector b = (5, 4) is column 2 plus column 3, so x can 
be (0, 1, 1). The same vector (5, 4) is also 2(column 1) + column 3, so another possible x 
is (2,0, 1). This matrix has the same column space as A = ]—any b is allowed. But now 
x has more components and there are more solutions. 


The next section creates another vector space, to describe all the possible solutions of 
Ax = 0. This section created the column space, to describe all the attainable right sides b. 


Problem Set 3.1 


The first problems are about vector spaces in general. The vectors in those spaces 
are not necessarily column vectors. In the definition of a vector space, vector addi- 
tion x + y and scalar multiplication cx are required to obey the following eight rules: 

(1) x+y=y+x 

(2) x+(yt+tz=(*+y)+z 

(3) There is a unique “zero vector” such that x + 0 = x for all x 

(4) For each x there is a unique vector —x such that x + (—x) = 0 

(5) 1 times x equals x 

(6) (cic2)x = c1(c2x) 

(7) c(x+y)=cx+cy 

(8) (cy +c2)xX =cix +c2x. 


1 Suppose the sum (x1, x2) + (y1, y2) is defined to be (x; + y2, x2 + y1). With the 
usual multiplication cx = (cx, cx2), which of the eight conditions are not satisfied? 


2 Suppose the multiplication cx is defined to produce (cx;, 0) instead of (cx1, cx2). 
With the usual addition in R?, are the eight conditions satisfied? 


3 (a) Which rules are broken if we keep only the positive numbers x > 0 in R!? 
Every c must be allowed. 


(b) The positive numbers with x + y and cx redefined to equal the usual xy and 
x° do satisfy the eight rules. Test rule 7 when c = 3,x = 2, y = 1. (Then 
x + y = 2 and cx = 8.) Which number is the “zero vector”? The vector “—2” 
isthe number _— 
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The matrix A = [$22] is a “vector” in the space M of all 2 by 2 matrices. Write 
down the zero vector in this space, the vector +A, and the vector —A. What matrices 
are in the smallest subspace containing A? 


(a) Describe a subspace of M that contains A = | 4 8 ] but not B = [$ 9]. 

(b) Ifa subspace of M contains A and B, must it contain 7? 

(c) Describe a subspace of M that contains no nonzero diagonal matrices. 

The functions f(x) = x? and g(x) = 5x are “vectors” in F. This is the vector 


space of all real functions. (The functions are defined for —oo < x < oo.) The 
combination 3 f (x) — 4g(x) is the function h(x) = 


Which rule is broken if multiplying f(x) by c gives the function f(cx)? Keep the 
usual addition f(x) + g(x). 


If the sum of the “vectors” f(x) and g(x) is defined to be the function f(g(x)), then 
the “zero vector” is g(x) = x. Keep the usual scalar multiplication c f (x) and find 
two rules that are broken. 


Questions 9-18 are about the “subspace requirements”: x + y and cx must stay in 
the subspace. 


9 


10 


11 


One requirement can be met while the other fails. Show this with 


(a) A set of vectors in R? for which x + y stays in the set but ix may be outside. 
(b) A set of vectors in R? (other than two quarter-planes) for which every cx stays 
in the set but x + y may be outside. 


Which of the following subsets of R? are actually subspaces? 


(a) The plane of vectors (b1, b2, b3) with bj = 0. 

(b) The plane of vectors with b; = 1. 

(c) The vectors with b;ib2b3 = 0. 

(d) All linear combinations of v = (1, 4, 0) and w = (2, 2, 2). 
(e) All vectors that satisfy bı + b2 + b3 = 0. 

(f) All vectors with bı < b2 < bz. 


Describe the smallest subspace of the matrix space M that contains 
1 0 O 1 

(a) | 0 1 and | 0 
L 4 1 0 1 0 

(b) k l (c) ; o| and p > 
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12 Let P be the plane in RÌ? with equation x + y — 2z = 4. Find two vectors in P and 
check that their sum is not in P. 


13 Let Po be the plane through (0, 0, 0) parallel to the previous plane P. What is the 
equation for Po? Find two vectors in Pop and check that their sum is in Po. 


14 = The subspaces of R? are planes, lines, R? itself, or Z containing (0, 0, 0). 


(a) Describe the three types of subspaces of R?. 
(b) Describe the five types of subspaces of R4. 


15 (a) The intersection of two planes through (0, 0, 0) is probably a 


(b) The intersection of a plane through (0, 0,0) with a line through (0, 0, 0) is 
probably a 


(c) IfS and T are subspaces of R°, prove that S N T (the set of vectors in both 
subspaces) is a subspace of R’. Check the requirements on x + y and cx. 


16 Suppose P is a plane through (0, 0, 0) and L is a line through (0, 0, 0). The smallest 
vector space containing both P and L is either or 


17 (a) Show that the set of invertible matrices in M is not a subspace. 
(b) Show that the set of singular matrices in M is not a subspace. 


18 True or false (check addition in each case by an example): 


(a) The symmetric matrices in M (with A! = A) form a subspace. 
(b) The skew-symmetric matrices in M (with AT = —A) forma subspace. 
(c) The unsymmetric matrices in M (with A! Æ A) form a subspace. 
Questions 19-27 are about column spaces R(A) and the equation Ax = b. 
19 Describe the column spaces (lines or planes) of these particular matrices: 
0 


1 2 1 1 0 
A=1|0 0 and B=1]0 and C=/2 0 
0 0 0 0 0 


20 For which right sides b do these systems have solutions? 


1 4 2 x] bı 1 4 bi 
(a) 2 8 4|lxa|=]|b (b) 2 9 e ~|b 
-1 —4 7 Nas b3 af d b3 


21 Adding row 1 of A to row 2 produces B. Adding column 1 to column 2 produces C. 
A combination of the columns of is also a combination of the columns of A. 
Those two matrices have the same column 


L 2 1 2 1 3 
a=; a and B= | | and gal | 


22 


23 


24 


25 


26 


27 
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For which vectors (b1, b2, b3) do these systems have a solution? 


1 1 1 X] bı 1 1 1 x1 bj 
0 1 1 x | = | bo and 0O 1 1 x2.)/=|bh 
0 0 X3 b3 0 0 O X3 b3 

1 1 1 X1 bi 

and k 0O 1 x |=| b 

0 0 1 X3 b3 


If we add an extra column b to a matrix A, then the column space gets larger unless 
. Give an example where the column space gets larger and an example where 
it doesn’t. 


The columns of AB are combinations of the columns of A. The column space of 
AB is contained in (possibly equal to) the column space of . Give an example 
where those two column spaces are not equal. 


Suppose Ax = b and Ay = b* are both solvable. Then Az = b + b* is solvable. 
What is z? This translates into: If b and b* are in the column space R(A), then 


If A is any 5 by 5 invertible matrix, then its column space is . Why? 
True or false (with a counterexample if false): 


(a) The vectors b that are not in the column space R(A) form a subspace. 
(b) If R(A) contains only the zero vector, then A is the zero matrix. 

(c) The column space of 2A equals the column space of A. 

(d) The column space of A — I equals the column space of A. 


Construct a 3 by 3 matrix whose column space contains (1, 1, 0) and (1, 0, 1) but not 
(1,1, 1). 


3.2 The Nullspace of A: Solving Ax = 0 


This section is about Ax = 0 for rectangular matrices. You will see the crucial role of the 
pivots—and especially the importance of missing pivots. Note first: Ax = 0 can always be 
solved. One immediate solution is x = Q—the zero combination. Elimination will decide 
if there are other solutions to Ax = 0, and find out what they are. 


Start with an important subspace. It contains every solution x. The columns of A 


have m components, but now x has n components. Please notice that difference! 


Definition The nullspace of A consists of all solutions to Ax = 0. These vectors x are 
in R”. The nullspace containing the solutions x is denoted by N(A). 
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Check that the solution vectors form a subspace. Suppose x and y are in the nullspace (this 
means Ax = 0 and Ay = 0). The rules of matrix multiplication give A(x + y) = 0+ 0. 
The rules also give A(cx) = c0. The right sides are still zero. Therefore x + y and cx are 
in the nullspace N(A). Since we can add and multiply without leaving the nullspace, it is 
a subspace. 

To repeat: The solution vectors x are in R”. The nullspace is a subspace of R”, while 
the column space R(A) is a subspace of R”. If the right side b is not zero, the solutions of 
Ax = b do not form a subspace. The vector x = 0 is not a solution if b 4 0. 


Example 3.6 The solutions to x + 2y + 3z = 0 form a plane through the origin. It is 
a subspace of RÌ. The plane is the nullspace of the 1 by 3 matrix A = [1 2 3]. The 
solutions to x + 2y + 3z = 6 also form a plane, but this plane is not a subspace. 


For many matrices, the only solution to Ax = 0 is x = 0. Their nullspaces contain only 
that single vector x = 0. We call this space Z, for zero. The only combination of the 
columns that produces b = 0 is then the “zero combination” or “trivial combination.” The 
solution is trivial (just x = 0) but the idea is not trivial. 

This case of a Zero nullspace and a unique solution is of the greatest importance. 


Example 3.7 Describe the nullspaces of A = [17] and B = [ } 7]. 


Solution Apply elimination to the linear equations Ax = 0: 


xı + 2x2 =0 = x1 + 2x2 =0 
2X1 + 4x2 = 0 0=0 


There is really only one equation. The second equation is twice the first equation. In the 
row picture, the line x; + 2x2 = 0 is the same as the line 2x; + 4x2 = 0. That line is the 
nullspace N(A). 

To describe a typical point in the nullspace, give any value to x2. This unknown is 
“free”. Then xı equals —2x2. If x2 = 4 then x; = —8. If x2 = 5 then xı = —10. Those 
vectors (—8, 4) and (—10, 5) and (—2x2, x2) lie in the nullspace N(A). 

The nullspace of this A is a line in the direction of (—2, 1). The line goes through 
(0, 0), which belongs to every nullspace. 

For the matrix B, the nullspace contains only the zero vector: 


xi + 2x0 =0 x, + 2x2 =0 = xj =0 
3x; + 4x2 = 0 —2x7 =0 x2 =0 


The only solution to Bx = 0 is x = 0. B is invertible so automatically x = B~'0 = 0. 
Elimination easily found the nullspaces for these 2 by 2 systems. The same method extends 


to m by n systems, square or rectangular. But there are new possibilities for columns 
without pivots. You have to see the elimination process one more time. 
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Solving Ax = 0 by Elimination 


This is important. We solve m equations in n unknowns—and the right sides are all zero. 
The left sides are simplified by row operations, after which we read off the solution (or 
solutions). Remember the two stages in solving Ax = 0: 


1 Forward elimination from A to a triangular U. 
2 Back substitution in Ux = 0 to find x. 


You will notice a difference in back substitution, when A and U have fewer than n pivots. 
We are allowing all matrices in this section, not just the nice ones (with inverses). 

Pivots are still nonzero. The columns below them are still zero. But it might happen 
that a column has no pivot. In that case, don’t stop the calculation. Go on to the next 
column. The first example is a 3 by 4 matrix: 


1 2 3 4 
A={2 4 8 10 
3 6 Il 14 


Certainly aj; = 1 is the first pivot. Clear out the 2 and 3 below it: 


12 3 4 
A>!0 0 2 2 (subtract 2 x row 1) 
0 0 2 2 (subtract 3 x row 1) 


The second column has a zero in the pivot position. We look below the zero for a nonzero 
entry, ready to do a row exchange. The entry below that position is also zero. Elimination 
can do nothing with the second column. This signals trouble, which we expect anyway for 
a rectangular matrix. There is no reason to quit, and we go on to the third column. 

The second pivot is 2 (but it is in the third column). Subtracting row 2 from row 3 
clears out that column. We arrive at 


123 4 (only two pivots) 
U=|]0 0 2 2 (the last equation 
0 0 0 0 became 0 = 0). 


The fourth column also has a zero in the pivot position—but nothing can be done. There 
is no row below it to exchange, and forward elimination is complete. The matrix has three 
rows, four columns, and only two pivots. The original Ax = 0 seemed to involve three 
different equations, but the third equation is the sum of the first two. It is automatically 
satisfied (0 = 0) when the first two equations are satisfied. Elimination reveals the inner 
truth about a system of equations. 

Now comes back substitution, to find all solutions to Ux = 0. With four unknowns 
and only two pivots, there are many solutions. The question is how to write them all down. 
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A good method is to separate the basic variables or pivot variables from the free variables. 
P The pivot variables are x; and x3, because columns 1 and 3 contain pivots. 
F The free variables are x2 and x4, because columns 2 and 4 have no pivots. 


The free variables x2 and x4 can be given any values whatsoever. Then back substitution 
finds the pivot variables x; and x3. (In Chapter 2 no variables were free. When A is 
invertible, all variables are pivot variables.) You could think of the pivot variables on the 
left side and the free variables moved to the right side: 


X1 
1 2 3 4 0 
us=jo o 2 2||2]=|o] gives 438 = -m2 4 
0000 3 0 x3 = 4. 
X4 


To describe all the solutions, the best way is to find two special solutions. Choose the free 
variables to be 1 or 0. Then the pivot variables are determined by the equations Ux = 0. 


Special Solutions 
1 Set x2 = 1 and x4 = 0. By back substitution x3 = 0 and x; = —2. 


2 Set x2 = 0 and x4 = 1. By back substitution x3 = —1 and x; = —1. 


These special solutions solve Ux = 0 and therefore Ax = 0. They are in the nullspace. 
The good thing is that every solution is a combination of the special solutions. 


Complete Solution 


—2 —1 —2x2 — X4 
1 
ram} otal S= |. 3.1) 
0 —] —X4 
0 1 x4 
special special complete 


Please look again at that answer. It is the main goal of this section. The vector (—2, 1, 0, 0) 
is the special solution when x2 = 1 and x4 = 0. The second special solution has x2 = 0 
and x4 = 1. All solutions are linear combinations of those two. The special solutions are 
in the nullspace N(A), and their combinations fill out the whole nullspace. 

The MATLAB code null computes these special solutions. They go into the columns 
of a nullspace matrix N. The complete solution to Ax = 0 is a combination of those 
columns. Once we have the special solutions, we have everything. 

There is a special solution for each free variable. If no variables are free—this means 
there are n pivots—then the only solution to Ux = 0 and Ax = 0 is the trivial solution 
x = 0. This is the uniqueness case, when all variables are pivot variables. In that case the 
nullspaces of A and U contain only the zero vector. With no free variables, the output from 
null is an empty matrix N. 
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Example 3.8 Find the nullspaces of U = [} 32] and A = [3 8]. 


The second column of U has no pivot. So x2 is free. The special solution has x2 = 1. Back 


substitution into 9x3 = 0 gives x3 = 0. Then x; + 5x2 = 0 or x; = —5. The solutions to 
Ux = 0 are 
—5 —5x2 The nullspace of U is a line 
Cea x) ke x2 through the special solution. 
0 One variable is free. N has one column. 


For the matrix A, no variables are free. Both columns have pivots. The equation 
4x2 = Q0 gives x2 = 0, and then 3x; = O forces x; = 0. The nullspace consists of the 
point x = 0 (the zero vector). The nullspace matrix N is empty (but the nullspace is never 


empty!). 


Echelon Matrices 


Forward elimination goes from A to U. The process starts with an m by n matrix A. It acts 
by row operations, including row exchanges. It goes on to the next column when no pivot 
is available in the current column. The process ends with an m by n matrix U, which has 
a special form. U is an echelon matrix, or “staircase matrix”. 
Here is a 4 by 7 echelon matrix with the three pivots boxed: 
three pivot variables x1, x2, x6 
: four free variables x3, x4, X5, x7 
0 four special solutions 


U= 


COO 8 
DONX 
ojx] s x 
Oe & 


Question What are the column space R(U) and the nullspace N(U) for this matrix? 


Answer The columns have four components so they lie in R4. The fourth component 
of every column is zero. Every combination of the columns—every vector in R(U)— 
has fourth component zero. The column space of U consists of all vectors of the form 
(bi, b2, b3, 0). For those vectors we can solve Ux = b by back substitution. These vec- 
tors b are all possible combinations of the seven columns.* 


The nullspace N(U) is a subspace of R”. The solutions to Ux = 0 are combinations of the 
four special solutions—one for each free variable: 


The free variables are x3, x4, x5, x7 because columns 3, 4, 5, 7 have no pivots. 
Set one free variable to 1 and set the other free variables to zero. Solve Ux = 0 
for the pivot variables. Then (x1, ..., x7) is one of the four special solutions 
in the nullspace matrix N. 


*We really only need the three columns with pivots. The free variables can all be set to zero, to find one 
particular solution of Ux = b. 
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Since echelon matrices U are so useful, we spell out their rules. The nonzero rows 
come first. The pivots are the first nonzero entries in those rows, and they go down in a 
staircase pattern. The usual row operations (in the code plu) produce a column of zeros 
below every pivot. 

Counting the pivots leads to an extremely important theorem. Suppose A has more 
columns than rows. With n > m there is at least one free VATIIRE: The system Ax = 0 
has a special solution. This solution is not zero! 


3B If Ax = 0 has more unknowns than equations (n > m: more columns than rows), then 
it has nonzero solutions. 


A short wide matrix has nonzero vectors in its nullspace. There must be at least n — m 
free variables, since the number of pivots cannot exceed m. (The matrix only has m rows, 
and a row never has two pivots.) Of course a row might have no pivot—which means an 
extra free variable. But here is the point: When a free variable can be set to 1, the equation 
Ax = 0 has a nonzero solution. 


To repeat: There are at most m pivots. With n > m, the system Ax = 0 has a free 
variable and a nonzero solution. Actually there are infinitely many solutions, since any 
multiple cx is also a solution. The nullspace contains at least a line of solutions. With two 
free variables, there are two special solutions and the nullspace is even larger. 

The nullspace is a subspace whose “dimension” is the number of free variables. This 
central idea—the dimension of a subspace—is defined and explained in this chapter. 


The Reduced Echelon Matrix R 


From the echelon matrix U we can go one more step. Continue onward from 


lL 2 i 4 
U=ļ|0 0 2 2 
0 0 0 O 


This matrix can be simplified by further row operations (which must also be done 
to the nght side). We can divide the second row by 2. Then both pivots equal 1. We 
can subtract 3 times this new row [0 0 1 1] from the row above. That produces 
a zero above the second pivot as well as below. The “reduced” echelon matrix is 


1 2 0 1 
R= {0 01 1 
0 0 0 0 


A reduced echelon matrix has 1’s as pivots. It has 0’s everywhere else in the pivot columns. 
Zeros above pivots come from upward elimination. 

If A is invertible, its echelon form is a triangular U. Its reduced echelon form is the 
identity matrix R = J. This is the ultimate in row reduction. 
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Some classes will take the extra step from U to R, others won’t. It is optional. 
Whether you do it or not, you know how. The equations look simpler but the solutions 
to Rx = Q are the same as the solutions to Ux = 0. Dividing an equation by 2, or 
subtracting one equation from another, has no effect on the nullspace. This is Gauss- 
Jordan elimination, which produces the extra zeros in R (above the pivots). Gaussian 
elimination stops at U. 

The zeros in R make it easy to find the special solutions (the same as before): 


1 Set x2 = 1 and x4 = 0. Solve Rx = 0. Then x; = —2 and x3 = Q. 
2 Set x2 = 0 and x4 = 1. Solve Rx = 0. Then x; = —1 and x3 = —1. 


The numbers —2 and 0 are sitting in column 2 of R (with plus signs). The numbers —1 
and —1 are sitting in column 4 (with plus signs). By reversing signs we can read off 
the special solutions from the reduced matrix R. The general solution to Ax = 0 or 
Ux = 0 or Rx = Q is a combination of those two special solutions: The nullspace 
N(A) = N(U) = N(R) contains 


—2 —] 
1 0 . 
= 9p) 0 + x4] = complete solution of Ax = 0. 
0 1 


To summarize: The pivot columns of R hold the identity matrix (with zero rows 
below). The free columns show the special solutions. Back substitution is very quick but 
it costs more to reach R. Most computer programs stop at U, but ref goes on: 


1 3 3 1 3 3 1 3 0 
A=}! 2 6 9 > U=]|0 0 3 > R= ]0 0 1 
pr 3 0 0 0 0 0 0 


A note about the theory The echelon matrix U depends on the order of the elimination 
steps. A row exchange gives new pivots and a different U. But the pivots in R are all 1’s. 
This reduced matrix stays exactly the same, even if you get to it with extra row exchanges. 
The theory of linear algebra notices four results that do not depend on the order of the 
elimination steps: 


1 The vectors x in the nullspace of A. 
2 The selection of free variables and pivot variables. 
3 The special solutions to Ax = 0 (each with a free variable equal to 1). 


4 The reduced echelon form R. 


The key is to realize what it means when a column has no pivot. That free column 
is a combination of previous pivot columns. This is exactly the special solution, with free 
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variable 1. The free column and pivot columns (times the right numbers from the special 
solution) solve Ax = 0. Those numbers are also in R. So the original A controls 1—4 
above for these reasons: 


1 The nullspace is determined by Ax = 0, before elimination starts. 

2 x; is free exactly when column j of A is a combination of columns 1,..., j — 1. 

3 The special solutions express the free columns as combinations of the pivot columns. 
4 R has those special combinations in its free columns. Its pivot columns contain 7. 


You can do any row operations in any order, and you never reach a different R. 


The code plu computes the echelon matrix U. Notice the differences from splu for square 
matrices. If a column has no pivot, the algorithm goes on to the next column. The program 
calls findpiv to locate the first entry larger than tol in the available columns and rows. This 
subroutine findpiv needs only four strange lines: 


[m,n] = size(A); r = find(abs(A(:)) > tol); 
if isempty(r), return, end 

r = r(1); j = fix((r—1)/m) + 1; 
p=ptj-1;k=k+r—-(j-1)*m-1; 


The pivot location is called [r, p] and p is added to the list of pivot columns. This 
list can be printed. The program ref computes the reduced echelon form R, and null puts 
the special solutions (one for each free variable) into the nullspace matrix N. 


function [P, L, U, pivcol] = plu(A) M-file: plu 
[m,n] = size(A); 
P = eye(m,m); 


= eye(m,m); 
U = zeros(m,n); 
pivcol = []; 
tol = 1.e-6; 
p= 1, 


for k = 1:min(m,n) 

[r, pl = findpiv(A(k:m,p:n),k,p,tol); 

ifr~= 
A((r k],1:n) = A({k r],1:n); 
if k > 1, L([r k],1:k-1) = L({k r],1:k-—1); end 
P(r k],1:m) = P({k r],1:m); 

end 

if abs(A(k,p)) >= tol 
pivcol = [pivcol p]; 
fori =k+1:m 
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L(i,k) = A(i,p)/A(k,p); 


forj =k+1:n 
A(i,j) = A(i,j)— L(i,k) * A(k,)); 
end 
end 
end 
forj = k:n 


U(k,j) = A(k,j) * (abs(Aík,j)) >= tol); 
end 
ifp<n,p=p+1; end 
end 


function [R, pivcol] = ref(A) 
% Scale rows of U so that all pivots equal 1. 


% Eliminate nonzeros above the pivots to reach R. 


[P,L,U,pivcol] = plu(A); 
R= U; 
[m,n] = size(R); 


for k = 1:length(pivcol); 


p = pivcol(k); 
forj =pt1:n 
R(k,j) = R(k,j)/R(k,p); 
end 
R(k,p) = E 
fori = 1:k-1 
forj=p+1:n 
R(i,j) = R(i,j))—R(i,p) * R(k,j); 
end 
R(i,p) = 0; 
end 
end 


function N = null(A) 


% The n—r columns of N are special solutions to Ax = 0. 


% N combines | with the nonpivot columns of R. 


[R,pivcol] = ref(A); 
[m,n] = size(A); 
r = length(pivcol); 
nopiv = 1:n; 
nopiv(pivcol) = []; 
N = zeros(n,n—r); 
ifn>r 
N(nopiv,:) = eye(n-r,n-r); 


M-file: ref 


M-file: null 
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ifr >0 
N(pivcol,:) = —R(1:r,nopiv); 
end 
end 
Summary of the Situation 
ee eee, We can’t see how many special solutions there are 


We can’t see the actual numbers in the special solutions. 


If m < n, there is at least one free variable and one special solution. The matrix has 
more columns than rows. There are nonzero vectors in its nullspace. If m > n we cannot 
determine from looking at Ax = 0 whether x = 0 is the only solution. 


We can see how many special solutions there are 


From Ux = 0 l l ; 
We can’t see the actual numbers in the special solutions. 


The number of special solutions is the number of free variables. This is the total number 
of variables (n) minus the number of pivots (r). Then n — r free variables have columns 
without pivots. To find the n — r special solutions, do back substitutions in Ux = 0—after 
assigning the value 1 to a free variable. 


We can see how many special solutions there are 


From Rx = 0 . ; l 
We can see the actual numbers in the special solutions. 


In the reduced echelon matrix R, every pivot equals 1—with zeros above and below. If it 
happens that these r pivot columns come first, the reduced form looks like 


I Fir 
R= 
F Je 


rn—r 


The pivot part is the identity matrix. The free part F can contain any numbers. They come 
from elimination on A—downward to U and upward to R. When A is invertible, F is 
empty and R = I. 

The special solutions to Rx = 0 (also to Ux = 0 and Ax = 0) can be found directly 
from R. These n — r solutions go into the columns of a nullspace matrix N. Notice how 
these block matrices give RN = 0: 


—F lr 
N= 

E | n—r 

n—r 


Example 3.9 
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A hasr = 2 pivots and n — r = 3 — 2 free variables. The special solution has free variable 
x3 = 1. Then x2 = —land xı = 1: 


The code null constructs N from R. The two parts of N are —F and I as above, but the 
pivot variables may be mixed in with the free variables—so J may not come last. 


Problem Set 3.2 


Questions 1-8 are about the matrices in Problems 1 and 5. 


1 Reduce these matrices to their ordinary echelon forms U: 
122 4 6 2 4 2 
(a) A=j}1 2 3 6 9 b) A=|]0 4 4 
0 01 2 3 0 8 8 


Which are the free variables and which are the pivot variables? 


2 For the matrices in Problem 1, find a special solution for each free variable. (Set the 
free variable to 1. Set the other free variables to zero.) 


3 By combining the special solutions in Problem 2, describe every solution to Ax = 0. 
The nullspace of A contains only the vector x = 0 when 


4 By further row operations on each U in Problem 1, find the reduced echelon form R. 
The nullspace of R is the nullspace of U. 


5 By row operations reduce A to its echelon form U. Write down a 2 by 2 lower 
triangular L such that A = LU. 


ay E =E E 
(a) =| 6 a (b) DE 6 3 


6 For the matrices in Problem 5, find the special solutions to Ax = 0. For an m by n 
matrix, the number of pivot variables plus the number of free variables is _ 


7 _In Problem 5, describe the nullspace of each A in two ways. Give the equations for 
the plane or line N(A), and give all vectors x that satisfy those equations (combina- 
tions of the special solutions). 


8 Reduce the echelon forms U in Problem 5 to R. For each R draw a box around the 
identity matrix that is in the pivot rows and pivot columns. 
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Questions 9-17 are about free variables and pivot variables. 


9 


10 


11 


12 


13 


14 


15 


16 


17 


True or false (with reason if true and example if false): 


(a) A square matrix has no free variables. 
(b) An invertible matrix has no free variables. 
(c) An m by n matrix has no more than n pivot variables. 


(d) An m byn matrix has no more than m pivot variables. 
Construct 3 by 3 matrices A to satisfy these requirements (if possible): 


(a) A has no zero entries but U = 1. 
(b) A has no zero entries but R = I. 
(c) A has no zero entries but R = U. 
(d A=U=2R. 


Put 0’s and x’s (for zeros and nonzeros) in a 4 by 7 echelon matrix U so that the 
pivot variables are 


(a) 2,4,5 (bì) 1,3,6,7 (c) 4and6. 


Put 0’s and 1’s and x’s (zeros, ones, and nonzeros) in a 4 by 8 reduced echelon matrix 
R so that the free variables are 


(a) 2,4,5,6 (b) 1,3,6,7, 8. 


Suppose column 4 of a 3 by 5 matrix is all zero. Then x4 is certainly a 
variable. The special solution for this variable is the vector x = 


Suppose the first and last columns of a 3 by 5 matrix are the same (not zero). Then 
is a free variable. The special solution for this variable is x = 


Suppose an m by n matrix has r pivots. The number of special solutions is 

The nullspace contains only x = 0 whenr = . The column space is all of R” 
when r = 

The nullspace of a 5 by 5 matrix contains only x = 0 when the matrix has 

pivots. The column space is R? when there are pivots. Explain why. 


The equation x — 3y — z = 0 determines a plane in R. What is the matrix A in 
this equation? Which are the free variables? The special solutions are (3, 1, 0) and 


18 


19 


20 
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The plane x — 3y — z = 12 is parallel to the plane x — 3y — z = 0 in Problem 17. 
One particular point on this plane is (12, 0, 0). All points on the plane have the form 
(fill in the first components) 


O;+y}]1]+z]0 

0 0 l 

If x is in the nullspace of B, prove that x is in the nullspace of AB. This means: If 
Bx = 0 then . Give an example in which these nullspaces are different. 


If A is invertible then N(AB) equals N(B). Following Problem 19, prove this second 
part: If ABx = 0 then Bx = 0. 


This means that Ux = 0 whenever LUx = 0 (same nullspace). The key is not that 
L is triangular but that L is 


Questions 21-28 ask for matrices (if possible) with specific properties. 


21 


22 


23 


24 


25 


26 


27 


28 


29 


30 


Construct a matrix whose nullspace consists of all combinations of (2, 2, 1, 0) and 
(3,1,0,1). 


Construct a matrix whose nullspace consists of all multiples of (4, 3, 2, 1). 


Construct a matrix whose column space contains (1, 1,5) and (0, 3, 1) and whose 
nullspace contains (1, 1, 2). 


Construct a matrix whose column space contains (1, 1, 0) and (0, 1, 1) and whose 
nullspace contains (1, 0, 1) and (0, 0, 1). 


Construct a matrix whose column space contains (1, 1, 1) and whose nullspace is the 
line of multiples of (1, 1, 1, 1). 


Construct a 2 by 2 matrix whose nullspace equals its column space. This is possible. 
Why does no 3 by 3 matrix have a nullspace that equals its column space? 


If AB = O then the column space of B is contained in the of A. Give an 
example. 


The reduced form R of a 3 by 3 matrix with randomly chosen entries is almost sure 
to be . What R is most likely if the random A is 4 by 3? 


Show by example that these three statements are generally false: 


(a) A and A! have the same nullspace. 
(b) A and A! have the same free variables. 


(c) A and A! have the same pivots. (The matrix may need a row exchange.) A and 
A! do have the same number of pivots. This will be important. 
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31 What is the nullspace matrix N (containing the special solutions) for A, B, C? 


I I 


A=[I I] and B=| i 


| and C=]. 


32 If the nullspace of A consists of all multiples of x = (2, 1, 0, 1), how many pivots 
appear in U? i 


33 Ifthe columns of N are the special solutions to Rx = 0, what are the nonzero rows 
of R? 


2 3 0 
N=1]1 0 and N=10 and N= 
0 1 | 


34 (a) What are the five 2 by 2 reduced echelon matrices R whose entries are all 0’s 
and 1’s? 


(b) What are the eight 1 by 3 matrices containing only 0’s and 1’s? Are all eight 
of them reduced echelon matrices? 


3.3 The Rank of A: Solving Ax = b 


This section moves forward with calculation and also with theory. While solving Ax = b, 
we answer one question about the column space and a different question about the null- 
space. Here are the questions: 


1 Does b belong to the column space? Yes, if Ax = b has a solution. 
2 Is x = 0 alone in the nullspace? Yes, if the solution is unique. 


If there are solutions to Ax = b, we find them all. The complete solution is x = xp + Xn. 
To a particular solution x, we add all solutions of Ax, = 0. 

Elimination will get one last workout. The essential facts about A can be seen from 
the pivots. A solution exists when there is a pivot in every row. The solution is unique 
when there is a pivot in every column. The number of pivots, which is the controlling 
number for Ax = b, is given a name. This number is the rank of A. 


The Complete Solution to Ax = b 


The last section totally solved Ax = 0. Elimination converted the problem to Ux = 0. 
The free variables were given special values (one and zero). Then the pivot variables were 
found by back substitution. We paid no attention to the right side b because it started and 
ended as zero. The solution x was in the nullspace of A. 
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Now b is not zero. Row operations on the left side must act also on the right side. 
One way to organize that is to add b as an extra column of the matrix. We keep the same 
example as before. But we “augment” A with the right side (b1, b2, b3) = (1, 6, 7): 


to 3 «aay 1 has the 12 3 4 4 

2 4 8 10 K =|6| augmented |2 4 8 10 

3 6 11 14 k 7 matrix 3 6 11 14 7 
4 


Take the usual steps. Subtract 2 times row 1 from row 2, and 3 times row 1 from 
row 3: 


123 47)" 1 has the 12344 

0 0 2 2 i = | 4 augmented 002 2 4 

00 2 21| 4 matrix 0022 4 
4 


Now subtract the new row 2 from the new row 3 to reach Ux = c = last column: 


x1 


1234] 1 has the 123 41 

002 2 = =|4 augmented 002 2 4 

00 0 Of} 0 matrix 0000 0 
4 


Are we making the point? It is not necessary to keep writing the letters x1, x2, x3, x4 and 
the equality sign. The augmented matrix contains all information for both sides of the 
equation. While A goes to U, b goes to c. The operations on A and b are the same, so they 
are done together in the augmented matrix. 

We worked with the specific vector b = (1, 6, 7), and we reached c = (1, 4, 0). Here 
are the same augmented matrices for a general vector (b1, b2, b3): 


1 2 3 4 bi 123 4 bi 
2 4 8 10 b —> 0 0 2 2 b -2b 
3 6 11 14 b 0 0 2 2 b3-—3b, 
123 4 bi 
— 0 0 2 2 bo—-2b, 
0 0 0 0 b-b-b, 


For the specific vector b = (1, 6, 7), the final entry is b3 — b2 — bj = 7 — 6 — 1. The third 
equation becomes 0 = 0. Therefore the system has a solution. This particular b is in the 
column space of A. 

For the general vector (b1, b2, b3), the equations might or might not be solvable. 
The third equation has all zeros on the left side. So we must have b3 — b2 — bı = 0 on 
the right side. Elimination has identified this exact requirement for solvability. It is the 
test for b to be in the column space. The requirement was satisfied by (1, 6, 7) because 
1-6— L=0. 
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Particular Solution and Homogeneous Solution 


Now find the complete solution for b = (1, 6, 7). The last equation is 0 = 0. That leaves 
two equations in four unknowns. The two free variables are as free as ever. There is a good 
way to write down all the solutions to Ax = b: 


Particular solution Set both free variables x2 and x4 to zero. Solve for xı and x3: 


xı +04+3x3+0=1 ae ApS 5 
2x3 +0=4 6 vice: D. 


This particular solution is xp = (—5, 0, 2, 0). For safety, check that Ax = b. 


Homogeneous solution The word “homogeneous” means that the right side is zero. But 
Ax = Q is already solved (last section). There are two special solutions, first with x2 = 1 
and second with x4 = 1. The solutions to Ax = 0 come from the nullspace so we call 
them x,: 


—2 -1 
1 0 

Xn = Xhomogeneous = *2 0 + X4 =e ie (3.2) 
0 1 


3C The complete solution is one particular solution plus all homogeneous solutions: 
X=XptXn is X complete = Xparticular + X homogeneous- 


To find every solution to Ax = b, start with one particular solution. Add to it every 
solution to Ax, = 0. This homogeneous part x, comes from the nullspace. Our particular 
solution xp comes from solving with all free variables set to zero. Another book could 
choose another particular solution. 

We write the complete solution xp + x, to this example as 


—5 —2 —] 
0 1 0 

x=] |+ olt] _y (3.3) 
0 0 1 


There is a “double infinity” of solutions, because there are two free variables. 

In the special case b = 0, the particular solution is the zero vector! The complete 
solution to Ax = 0 is just like equation (3.3), but we didn’t print xp = (0, 0, 0, 0). What 
we printed was equation (3.2). The solution plane in Figure 3.3 goes through xp = 0 when 
b= 0. 

All linear equations—matrix equations or differential equations—fit this pattern x = 
Xp + Xn. 


Here are the five steps to the complete solution xp + x, of Ax = b: 


1 Add b as an extra column next to A. Reduce A to U by row operations. 
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x TZ x, solves Ax, = b 
VZ 


x, solves Ax, =0 


Add: x solves Ax =b 
Plane of solutions 


Figure 3.3 The complete x = xp + xn is a particular solution plus any solution to 
Ax = 0. 


2 Zero rows in U must have zeros also in the extra column. Those equations are 0 = 0. 
3 Set the free variables to zero to find a particular solution xp. 


4 Set each free variable, in turn, equal to 1. Find the special solutions to Ax = 0. Their 
combinations give Xp. 


5 The complete solution to Ax = b is xp plus the complete solution to Ax = 0. 


Reduction from Ax = bto Ux =c to Rx =d 


By staying longer with forward elimination, we can make back substitution quicker. In- 
stead of stopping at U, go on to R. This is the reduced echelon form (still optional). All 
its pivots equal 1. Below and above each pivot are zeros. We met this Gauss-Jordan 
elimination for Ax = 0, and now we apply it to Ax = b. 

Go back to our 3 by 4 example, with the right side b in column 5. For this augmented 
matrix, continue elimination from U to R. First divide row 2 by its pivot, so the new pivot 
is 1: 


123 4 1 12 3 4 1 1 2 01 -5 

002 2 4 — 0 01 1 2 —> 0 0 1 1 2 

00000 00000 0000 0 

Á C a d 
U R 


The last step subtracted 3 times row 2 from row 1—the row above. The identity matrix 
is now in columns 1 and 3, with zeros beneath it. The particular solution x, only uses 
those pivot columns—the free variables are set to zero. We can read off xp = (—5, 0, 2, 0) 
directly from the new right side. When the matrix is I, the particular solution is right there. 

The nullspace matrix N is also easy from R. Change column 5 to zero. The special 
solution with x2 = 1 has xj = —2 and x3 = 0 (exactly as before). The special solution 
with x4 = 1 has x} = —1 and x3 = —1 (also as before). These numbers come directly 
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from R—with signs reversed. Solutions are not changed—this is the point of elimination! 
The solutions become clearer as we go from Ax = b to Ux = c to Rx = d. 


Consistent = Solvable 


With (b1, b2, b3) on the right side, the equations might be solvable or not. Elimination 
produced a third equation 0 = b3 — b2 — by. To be solvable, this equation must be 0 = 0. 

Translate that into: The column space of A contains all vectors with b3—b2—b, = 0. 
Here is the common sense reason. On the left side, row 3 minus row 2 minus row 1 leaves 
a row of zeros. (Elimination discovered that.) The same combination of right sides must 
give zero, if Ax = b is solvable. 

In different words: Each column of A has component 3 minus component 2 minus 
component 1 equal to zero. So if b is a combination of the columns, it also has b3 — b2 — 
bı = 0. Then the equations are “consistent”. They have a solution. 


Important for the future The three rows were multiplied by —1, —1, +1 to give the zero 
row. This vector y = (—1, —1, +1) is in the nullspace of AT: 


1 2 3 1 0 —row lof A 
AP pe 2 4 arpes 0 — row 20f A 
3 8 Il +i 0 +row3ofA 
4 10 14 0 total: zero 


When a combination of rows gives the zero row, the same combination must give zero on 
the right side. The dot product of y = (—1, —1, +1) with b = (b1, bo, b3) is —b1 — b2 +b3. 
This dot product must be zero, when Ax = b is solvable. That is the hidden relation 
between y in the nullspace of AT and b in the column space of A. 


Example 3.10 (Extra practice in reducing Ax = b to Ux = c and solving for x) 


l i : j = bı reduces to l 3 ; - = bı 
2 6 10| lb 0 0 Of] | |b. -2b | 
x3 x3 


This is solvable only if 0 = bz — 2b;. So the column space of A is the line of vectors with 
b = 2b,. Row 2 of the matrix is 2 x row 1. The free variables are x2 and x3. There is 
only one pivot (here Rx = d is the same as Ux = c): 


The particular solution has x2 = 0 and x3 = 0. Then x; = by. 
The first special solution with b = 0 has x2 = 1 and x3 = 0. Then x; = —3. 
The second special solution with b = 0 has x2 = 0 and x3 = 1. Then x; = —5. 
The complete solution (assuming b2 = 2b,, otherwise no solution) is 


bı —3 —5 


X=Xpt+Xn,=}] 0] + x2 1} +x3] 0 
0 0 1 
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The Rank of a Matrix 


The numbers m and n give the size of a matrix—but not necessarily the true size of a linear 
system. An equation like 0 = O should not count. In our 3 by 4 example, the third row 
of A was the sum of the first two. After elimination, the third row of U was zero. There 
were only two pivots—therefore two pivot variables. This number r = 2 is the rank of the 
matrix A. It is also the rank of U. 

The rank r is emerging as the Key to the true size of the system. 


Definition 1 The rank of A is the number of pivots (nonzero of course). 


Definition 2 The rank of A is the number of independent rows. 


The first definition is computational. The pivots spring from the elimination steps—we 
watch for nonzeros in certain positions. Counting those pivots is at the lowest level of 
linear algebra. A computer can do it for us (which is good). 

Actually the computer has a hard time to decide whether a small number is really 
zero. When it subtracts 3 times .33---3 from 1, does it obtain zero? Our teaching codes 
use the tolerance 10~°, but that is not entirely safe. 

The second definition of rank is at a higher level. It deals with entire rows—vectors 
and not just numbers. We have to say exactly what it means for rows to be “independent.” 
That crucial idea comes in the next section. 

A third definition, at the top level of linear algebra, will deal with spaces of vectors. 
The rank r gives the size or “dimension” of the row space. The great thing is that r also 
reveals the dimension of all other important subspaces—including the nullspace. 


1 3 4 1 3 1 
Example 3.11 ; 6 sf and | g | and | 5 | and (11 at have rank 1. 


This much we can already say about the rank: There are r pivot variables because 
there are r pivots. This leaves n — r free variables. The pivot variables correspond to the r 
columns with pivots. The free variables correspond to the n — r columns without pivots. 

Certainly the rank satisfies r < m, since m rows can’t contain more than m pivots. 
Alsor < n, since n columns can’t contain more than n pivots. The number of pivots (which 
is r) cannot exceed m or n. The extreme cases, when r is as large as possible, are in many 
ways the best for Ax = b: 


3D If r = n there are no free variables. Ax = b cannot have two different solutions 
(uniqueness). 
If r = m there are no zero rows in U. Ax = b has at least one solution (existence). 


Start with r = n pivots. There are no free variables (n — r = 0). Back substitution gives 
all n components of x. The nullspace contains only x = 0. The shape of A is tall and 
thin—it has at least n rows and maybe more. The solution is unique if it exists, but it might 
not exist. The extra rows might not lead to 0 = 0. 


The other extreme has r = m pivots. There are no zero rows in U. The column space 
contains every vector b. The shape of A is short and wide—tt has at least m columns and 
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maybe more. A solution to Ax = b always exists, but it might not be unique. Any extra 
columns lead to free variables which give more solutions. 


With r = n there are 0 or 1 solutions. The columns are “independent.” The matrix has 
“full column rank.” With r = m there is 1 solution, or infinitely many. The rows are 


independent. The matrix has “full row rank.” With m = n = r there is exactly one 
solution. The matrix is square and invertible. 

The most important case is m = n = r. The whole of Chapter 2 was devoted to this 
square invertible case—when Ax = b has exactly one solution x = A~'b. Here are four 
examples, all with two pivots. The rank of each matrix is r = 2: 


1 3 m=3 one zero row in U 1 3 
A=|{]2 8 n=2 _ no free variables U=})0 -2 
1 7 r=2 — unique solution (if it exists) 0 0 
121 m=2  nozero TONS in U 1241 
A= n=3 one free variable U= 
3 8 7 i ; : 0 2 4 
r=2 solution exists (not unique) 
12 m=2 solution exists 12 
A= 3 g n=2 solution is unique y= 0 2 
r=2_ Ais invertible 
1227 m=3 a 121 
hae Ba). wes ee 2 u=|0 2 4 
3 g7 — solution may not exis 00 0 


solution is not unique 


We have a first definition of rank, by counting pivots. Now we need the concept of indepen- 
dence. Then we are ready for the cornerstone of this subject—the Fundamental Theorem 
of Linear Algebra. 


Summary of the Situation 


We can’t see if solutions exist 


From Ax = b 
We can’t see what the solutions are 


The vector b might not be in the column space of A. Forward elimination takes A to U. 
The right side goes from b to c: 


We can see if solutions exist 


From Ux =e 
We can’t see what the solutions are 


Every zero row in U must be matched by a zero in c. These are the last m — r rows of U 
and the last m — r components of c. Then solve finds x by back substitution, with all free 
variables set to zero. Now eliminate upward to reach R, as the right side changes to d: 


We can see if solutions exist 


From Rx = d 
We can see what the solutions are 
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The reduced matrix R has the identity matrix Z in its pivot columns. If those columns come 
first, the equations Rx = d look like 


I F{|Xxp dp| r pivot rows 
Rx = = 

O Oj|xr d, | m—r zero rows 

rn—r 


The zero rows in R must be matched by d} = 0. Then solve for x. The particular solution 
has all free variables zero: xf = 0 and xp = dp (because R contains 7). 


Xcomplete = Xparticular + Xhomogeneous = 0 F I Xf. 


That is the nullspace matrix N multiplying xf. It contains the special solutions, with 1’s 
and 0’s for free variables in J. All solutions to Rx = 0 and Ax = 0 are combinations Nx 
of the special solutions. For example: 


Te) [U c] 


[A =| 3 2 9 


[| 
rs | 
O = 
NO = 


There are r = 2 pivots and n — r = 3 — 2 free variables. There are no zero rows in U 
and R, so the d, part of the right side is empty. Solutions exist. The particular solution 
contains 3 and 2 from the last right side d. The homogeneous solution contains —1 and 1 
from the free column of R, with the signs reversed: 


3 1 
Xcomplete = Xparticular + Xnullspace = A ot aa We ec 
0 l 


Problem Set 3.3 


Questions 1—12 are about the solution of Ax = b. Follow the five steps in the text to 
Xp and Xp. 


1 Write the complete solution in the form of equation (3.3): 
x Sy oz = 1 
2x +6y+9z=5 
=< — Sy +325. 


2 Find the complete solution (also called the general solution) to 


l >E 2 l 
2 6 4 8 =ą|3 
0 0 2 4 l 


~ N Y X 


130 


10 


3 Vector Spaces and Subspaces 


Under what condition on b1, b2, b3 is this system solvable? Include b as a fourth 
column in elimination. Find all solutions: 


x+2y—2z=b, 
2x + 5y — 4z = by 
4x + 9y — 8z = b3. 


Under what conditions on bj, b2, b3, b4 is each system solvable? Find x in that case. 


1 2 bi 2 DTe bi 
2 44x bz 2 4 6 ; bz 
2 5 bl moll (cme me ie es Me 
3 9 b4 $ o pje” b4 


Show by elimination that (b1, b2, b3) is in the column space of A if b3 — 2b2 + 
4b, = 0. 


L33 ok 
A=/3 8 2 
2 4 0 
What combination of the rows of A gives the zero row? 


Which vectors (b1, b2, b3) are in the column space of A? Which combinations of the 
rows of A give zero? 


12 1 1 1 1 
(a) A=|]2 6 3 b) A=|]1 2 4 

0 2 5 2 4 8 
Construct a 2 by 3 system Ax = b with particular solution xp = (2, 4,0) and 
homogeneous solution x, = any multiple of (1, 1, 1). 


Why can’t a 1 by 3 system have x, = (2, 4, 0) and x, = any multiple of (1, 1, 1)? 


(a) If Ax = b has two solutions x; and x2, find two solutions to Ax = 0. 


(b) Then find another solution to Ax = 0 and another solution to Ax = b. 
Explain why these are all false: 


(a) The complete solution is any linear combination of xp and Xp. 
(b) A system Ax = b has at most one particular solution. 


(c) The solution x, with all free variables zero is the shortest solution (minimum 
length ||x ||). Find a 2 by 2 counterexample. 


(d) If A is invertible there is no homogeneous solution Xp. 


11 


12 
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Suppose column 5 of U has no pivot. Then xs is a variable. The zero vector 
iS the only solution to Ax = 0. If Ax = b has a solution, then it has 
solutions. 


Suppose row 3 of U has no pivot. Then that row is . The equation Ux = c 
is only solvable provided . The equation Ax = b (is) (is not) (might not be) 
solvable. 


Questions 13-18 are about matrices of ‘full rank” r =m orr =n. 


13 


14 


15 


16 


17 


18 


19 


20 


The largest possible rank of a 3 by 5 matrix is . Then there is a pivot in every 
of U. The solution to Ax = b (always exists) (is unique). The column space 
ofAis____ . An example is A = 


The largest possible rank of a 6 by 4 matrix is . Then there is a pivot in every 
of U. The solution to Ax = b (always exists) (is unique). The nullspace of A 
1S . An example is A = 


Find by elimination the rank of A and also the rank of AT: 


1 4 O 0 1 
As) 2 Tl 5 and A=|1 1 2 | (rank depends on q). 
—-1 2 10 1 1 gq 


Find the rank of A and also of ATA and also of AAT: 


2 0 
a=) s 7 and A=|1 1 
i 2 


1 0 1 
Reduce A to its echelon form U. Then find a triangular L so that A = LU. 
1 0 1 0 
SFR and A=|2 2 0 3 
065 4 


Find the complete solution in the form (3.3) to these full rank systems: 


x+y+z=4 


a x+y+z=4 b 
(a) y (b) Gasp heed: 


If Ax = bhas infinitely many solutions, why is it impossible for Ax = B (new right 
side) to have only one solution? Could Ax = B have no solution? 


Choose the number q so that (if possible) the rank is (a) 1, (b) 2, (c) 3: 


6 4 2 
isla > 2p! ond B=|? s T 
9 6 q q q 
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Questions 21—26 are about matrices of rank r = 1. 
21 Fill out these matrices so that they have rank 1: 


12 4 ME 
AS 2 and B=] 1 and “=| i 
4 c 


22 If Ais anm by n matrix with r = 1, its columns are multiples of one column and its 
rows are multiples of one row. The column space is a in R”. The nullspace 
isa in R”. Also the column space of A! isa in R”. 


T 


23 Choose vectors u and v so that A = uv* = column times row: 


3 6 6 
As) 2 2 and Fp e F) 
4 8 8 


A = uv! is the natural form for every matrix that has rank r = 1. 


24 IfA isa rank one matrix, the second row of U is . Do an example. 


25 Multiply a rank one matrix times a rank one matrix, to find the rank of AB and AM: 
1 2 2 1 4 l b 
ie A i s=; 1.5 4 ae “=|; at 


26 The rank one matrix uv! times the rank one matrix wz! is uz! times the number 
. This has rank one unless = (), 


27 Give examples of matrices A for which the number of solutions to Ax = b is 


(a) Oor1, depending on b 
(b) ox, regardless of b 
(c) Oor oo, depending on b 


(d) 1, regardless of b. 
28 Write down all known relations between r and m and n if Ax = b has 


(a) no solution for some b 
(b) infinitely many solutions for every b 
(c) exactly one solution for some b, no solution for other b 


(d) exactly one solution for every b. 
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Questions 29-33 are about Gauss-Jordan elimination and the reduced echelon ma- 
trix R. 


29 


30 


31 


32 


33 


34 


35 


36 


Continue elimination from U to R. Divide rows by pivots so the new pivots are all 1. 
Then produce zeros above those pivots to reach R: 


2 4 4 2 4 4 
U=|0 3 6 and U=1;0 3 6 
0 0 0 0 0 5 


Suppose U is square with n pivots (an invertible matrix). Explain why R = I. 


Apply Gauss-Jordan elimination to Ux = 0 and Ux = c. Reach Rx = 0 and 
Rx = d: 


12 3 0 I 23 3 
i =o 0 4 d an eJ=fo 0 4 I 
Solve Rx = 0 to find x, (its free variable is x2 = 1). Solve Rx = d to find xp (its 


free variable is x2 = 0). 


Gauss-Jordan elimination yields the reduced matrix R. Find Rx = 0 and Rx = d: 


3 0 6 0 3 06 9 
U 0|=|0 0 2 0 and U ce]=|0 0 2 4 
0 0 0 0 0 0 0°55 


Solve Ux = 0 or Rx = 0 to find x, (free variable = 1). What are the solutions to 
Rx = d? 


Reduce Ax = b to Ux = c (Gaussian elimination) and then to Rx = d (Gauss- 
Jordan): 


1 0 2 3 2 
Ax=|1 3 2 0||2 |=| 5] =o. 
2049 : 10 
X4 


Find a particular solution x, and all homogeneous solutions Xp. 


Find matrices A and B with the given property or explain why you can’t: The only 
1 i 
solution of Ax = B is x = [9]. The only solution of Bx = [9] is x = 2l: 


Find the LU factorization of A and the complete solution to Ax = b: 


1 3 d l 1 
1 2 3 3 0 
A= 246 and b= 6 and then b = 0 
1 1 § 5 0 


The complete solution to Ax =|4]isx =[4]+c[°]. Find A. 
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3.4 Independence, Basis, and Dimension 


This important section is about the true size of a subspace. The columns of A have m 
components. But the true “dimension” of the column space is not necessarily m (unless 
the subspace is the whole space R”). The dimension is measured by counting independent 
columns—and we have to say what that means. 

The idea of independence applies to any vectors v1, ..., Va in any vector space. Most 
of this section concentrates on the subspaces that we know and use—especially the column 
space and nullspace. In the last part we also study “vectors” that are not column vectors. 
Matrices and functions can be linearly independent (or not). First come the key examples 
using column vectors, and then extra examples with matrices and functions. 

The final goal is to understand a basis for a vector space—like i = (1,0) and j = 
(0, 1) in the xy plane. We are at the heart of our subject, and we cannot go on without a 
basis. If you include matrix examples and function examples, allow extra time. The four 
essential ideas in this section are 


1 Independent vectors 
2 Basis for a space 
3 Spanning a space 


4 Dimension of a space. 


Linear Independence 


The two vectors (3, 1) and (6, 2) are not independent. One vector is a multiple of the 
other. They lie on a line—which is a one-dimensional subspace. The vectors (3, 1) and 
(4, 2) are independent—they go in different directions. Also the vectors (3, 1) and (7, 3) 
are independent. But the three vectors (3, 1) and (4, 2) and (7, 3) are not independent— 
because the first vector plus the second vector equals the third. 

The key question is: Which combinations of the vectors give zero? 


Definition The sequence of vectors v;,..., v, is linearly independent if the only com- 
bination that gives the zero vector is Ov; + Ov2 + --- + Ov,. Thus linear independence 
means that 


C1V1 + C202 +++:+CpV_, =0 only happens when all c’s are zero. 


If a combination J c;v; gives 0, when the c’s are not all zero, the vectors are dependent. 
Correct language: “The sequence of vectors is linearly independent.” Acceptable 
shortcut: “The vectors are independent.” 


A collection of vectors is either dependent or independent. They can be combined to give 
the zero vector (with nonzero c’s) or they can’t. 
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Figure 3.4 (a) Dependent (on a line) (b) Independent (c) A combination of the vec- 
tors equals zero. 


The examples with (3, 1) and (6, 2) and (4, 2) are drawn in Figure 3.4: 


Dependent: 2(3, 1) — (6, 2) = (0, 0) cı = 2 and c = —1 
Independent: Only 0(3, 1) + 0(4, 2) = (0,0) cı = 0 and cœ = 0 
Dependent: (3,1) + (4,2) — (7,3) = (0,0) c = 1,c2 = 1, c3 = —1 


The c’s in the first part of Figure 3.4 could also be 4 and —2. For independent vectors the 
only choice is 0 and 0. The c’s in the third case could be 7, 7, and —7. 

The test for independent vectors is: Which combinations equal zero? For n columns 
in a matrix A, the combinations are exactly Ax. We are asking about Ax = 0. Are there 
any nonzero vectors in the nullspace? If so, those columns are dependent. 


3E The columns of A are independent if the only solution to Ax = 0 is x = 0. Elimina- 
tion produces no free variables. The matrix has rank r = n. The nullspace contains only 
x=0. 


That is the definition of linear independence, written specifically for column vectors. The 
vectors go into the columns of A, and the c’s go into x. Then solve Ax = 0 by elimination. 
If there are any free variables, there will be nonzero solutions—the vectors are dependent. 
The components of x are the numbers cj, ..., cn that we are looking for. 


Example 3.12 The columns of this matrix are dependent: 


2 -4 -7 2 0 0 
Axes) 2 8 3S kO so 2 +3 — 1 =k 
2 -E 7 0 0 


The rank of A is only r = 2. Independent columns would give full rank r = n = 3. 
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In that matrix the rows are also dependent. You can see a combination of those rows 
that gives the zero row (it is row 1 minus row 3). For a square matrix, we will show that 
dependent columns imply dependent rows and vice versa. 


Another way to describe linear dependence is this: “One of the vectors is a combination 
of the other vectors.” That sounds clear. Why don’t we say this from the start? Instead 
our definition was longer: “Some combination gives the zero vector, other than the trivial 
combination with every c = 0.” We must rule out the easy way to get the zero vector. That 
trivial combination of zeros gives every author a headache. In the first statement, the vector 
that is a combination of the others has coefficient c = 1. 

The point is, our definition doesn’t pick out one particular vector as guilty. All 
columns of A are treated the same. We look at Ax = 0, and it has a nonzero solution or it 
hasn’t. In the end that is better than asking if the last column (or the first, or a column in 
the middle) is a combination of the others. 


One case is of special importance. Suppose seven columns have five components each 
(m = 5 is less than n = 7). Then the columns must be dependent! Any seven vectors 
from R’ are dependent. The rank of A cannot be larger than 5. There cannot be more than 
five pivots in five rows. The system Ax = 0 has at least 7 — 5 = 2 free variables, so it has 
nonzero solutions—which means that the columns are dependent. 


3F Any set of n vectors in R” must be linearly dependent if n > m. 


This is exactly the statement in Section 3.2, when Ax = 0 has more unknowns than equa- 
tions. The matrix has more columns than rows—it is short and wide. The columns are 
dependent if n > m, because Ax = 0 has a nonzero solution. 


Vectors That Span a Subspace 


The first subspace in this book was the column space. Starting with n columns v1, ..., Up, 
the subspace was filled out by including all combinations x; v1 + ---+X,v,. The column 
space consists of all linear combinations of the columns. We now introduce the single 
word “span” to describe this: The column space is spanned by the columns. 


Definition A set of vectors spans a space if their linear combinations fill the space. 


To repeat: The columns of A span the column space. Don’t say “the matrix spans the 
column space.” The idea of taking linear combinations is familiar, and the word span says 
it more quickly. If a space V consists of all linear combinations of the particular vectors 
V1,.--, Vn, then these vectors span V. 

The smallest space containing those vectors is the space V that they span. We have 
to be able to add vectors and multiply by scalars. We must include all linear combinations 
to produce a vector space. 


Example 3.13 The vectors vı = | į ] and v2 = [9 ] span the two-dimensional space R°. 


Example 3.14 The three vectors vı = [4], v2 = [9], v3 = [4] also span the same 
space R?. 
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Example 3.15 The vectors w; = Í | | and w2 = Ea only span a line in R?. So does wį 


by itself. So does w2 by itself. 


Think of two vectors coming out from (0, 0, 0) in 3-dimensional space. Generally they 
span a plane. Your mind fills in that plane by taking linear combinations. Mathematically 
you know other possibilities: two vectors spanning a line, three vectors spanning all of R3, 
three vectors spanning only a plane. It is even possible that three vectors span only a line, 
or ten vectors span only a plane. They are certainly not independent! 

The columns span the column space. Here is a new subspace—which begins with the 
rows. The combinations of the rows produce the “row space.” 


Definition The row space of a matrix is the subspace of R” spanned by the rows. 


The rows of an m by n matrix have n components. They are vectors in R’—or they would 
be if they were written as column vectors. There is a quick way to do that: Transpose the 
matrix. Instead of the rows of A, look at the columns of A!. Same numbers, but now in 
columns. 

The row space of Ais R(A‘). It is the column space of A". It is a subspace of R”. 
The vectors that span it are the columns of AT, which are the rows of A. 


Example 3.16 A = [27 | and AT = [123]. Here m = 3 and n = 2. 


The column space of A is spanned by the two columns of A. It is a plane in RÌ. The row 
space of A is spanned by the three rows of A (columns of AT). It is all of R?. Remember: 
The rows are in R”. The columns are in R™. Same numbers, different vectors, different 
spaces. 


A Basis for a Vector Space 


In the xy plane, a set of independent vectors could be small—yjust one single vector. A 
set that spans the xy plane could be large—three vectors, or four, or infinitely many. One 
vector won’t span the plane. Three vectors won’t be independent. A “basis” for the plane 
is just right. 


Definition A basis for a vector space is a sequence of vectors that has two properties at 
once: 


1 The vectors are linearly independent. 


2 The vectors span the space. 


This combination of properties is fundamental to linear algebra. Every vector v in the 
space is a combination of the basis vectors, because they span the space. More than that, 
the combination that produces v is unique, because the basis vectors v1, ..., V, are inde- 
pendent: 


There is one and only one way to write v as a combination of the basis vectors. 
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Reason: Suppose v = avı +--+ ann and also v = bivi +---+ bava. By subtraction 
(ay — b1)vı +---+ (an — Dn) vp is the zero vector. From the independence of the v’s, each 
a; — b; = 0. Hence a; = b;. 


Example 3.17 The columns of J = [ į °] are a basis for R?. This is the “standard basis.” 


The basis vectors are i = | and j = H .They are independent. They span R’. 


Everybody thinks of this basis first. The vector i goes across and j goes straight up. 
Similarly the columns of the 3 by 3 identity matrix are the standard basis i, j, k. The 
columns of the n by n identity matrix give the “standard basis” for R”. Now we find other 
bases. 


Example 3.18 (Important) The columns of any invertible n by n matrix give a basis 
for R”: 


1 0 0 
R and A=|1 1 0 bet es k 
2A 2 4 

1 1 1 
When A is invertible, its columns are independent. The only solution to Ax = 0 is x = 0. 
The columns span the whole space R”—because every vector b is a combination of the 
columns. Ax = b can always be solved by x = A~'b. Do you see how everything comes 
together? Here it is in one sentence: 


3G The vectors v;,..., V, are a basis for R” exactly when they are the columns of an n 
by n invertible matrix. Thus R” has infinitely many different bases. 


When the columns are independent, they are a basis for the column space. When the 
columns are dependent, we keep only the pivot columns—the r columns with pivots. The 
picture is clearest for an echelon matrix U. 

The pivot columns of A are one basis for the column space of A—which is different 
from the column space of U. Section 3.5 will study these bases carefully. 


3H The pivot columns are a basis for the column space. The pivot rows are a basis for 
the row space. 


Example 3.19 For an echelon matrix those bases are clear: 


1 2 1 4 7 1 1 4 
0 0 5 3 8 0 5 3 
Us 00069 has pivot columns 0 0 6 
0 0 0 0 0 0 OJ LO 


Columns 1, 3, 4 are a basis for the column space of U. So are columns 2, 3, 4 and 1, 3, 5 
and 2, 4, 5. (But not columns 1 and 4 by themselves, and not 1, 2, 4.) There are infinitely 
many bases (not just these columns!). But the pivot columns are the natural choice. 
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How to show that the pivot columns are independent? Start with a combination that 
gives zero. Then show that all the c’s must be zero: 


C] +c +3 (3.4) 


| 
SGOG 


1 1 4 
0 5 3 
0 0 6 
0 0 0 
The third component gives 6c3 = 0. Therefore c3 = 0. Then the second component gives 
5c2 = 0 so c2 = 0. Now the first component gives cı = 0. The pivots “stick out.” We use 
them in order (back substitution) to prove that the c’s are zero. Please note how a proof of 
independence starts, with a combination (3.4) that gives zero. 


Important: Every vector b in the column space can be produced out of the three pivot 
columns 1, 3, 4—by solving Ax = b. The pivot columns span the column space. Those 
columns are independent. So they are a basis. 

The three nonzero rows of U are a basis for the row space. They span the row space 
(the zero row adds nothing). The three rows are independent (again the pivots stick out). 
Are they the only basis? Never. 


Question! If v,,..., Vn is a basis for R”, can we use some of the v’s in a basis for the 
row space? 

Answer Not always. The v’s might not be in the row space. To find a basis for a sub- 
space S, we have to look at S. One thing is sure: The subspace won’t have more than n 
basis vectors. 


Question 2 Given five vectors, how do you find a basis for the space they span? 
Answer Make them the rows of A, and eliminate to find the nonzero rows of U. 

Second way Put the five vectors into the columns of A. Eliminate to find the pivot 
columns (of A not of U!). The program basis uses the column numbers from pivcol. 


function B = basis(A) M-file: basis 
[P,L,U,pivcol] = plu(A) 
B = A(:,pivcol); 


The column space of U had r = 3 basis vectors. Could another basis have more than 
r vectors, or fewer? This is a crucial question with a good answer. All bases for a vector 
space contain the same number of vectors. This number is the “dimension.” 


Dimension of a Vector Space 


We have to prove what was just stated. There are many choices for the basis vectors, but 
the number of basis vectors doesn’t change. 


31 If vj,..., Vm and wj,..., w, are both bases for the same vector space, then m = n. 
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Proof Suppose there are more w’s than v’s. Fromn > m we want to reach a contradiction. 
The v’s are a basis, so wı must be a combination of the v’s. If w, equals aj,v); +: + 
Am1Um, this is the first column of a matrix multiplication V A: 


ay} 
W=]|w w Wn | = | v Um : = VA. (3.5) 


aml 


We don’t know the a’s, but we know the shape of A (it is m by n). The second vector w2 is 
also a combination of the v’s. The coefficients in that combination fill the second column 
of A. The key is that A has a row for every v and a column for every w. It is a short wide 
matrix, since n > m. There is a nonzero solution to Ax = 0. But then VAx = 0 and 
Wx = 0 and a combination of the w’s gives zero! The w’s could not be a basis—which is 
the contradiction we wanted. 

If m > n we exchange the v’s and w’s and repeat the same steps. The only way to 
avoid a contradiction is to have m = n. This completes the proof. 


The number of basis vectors depends on the space—not on a particular basis. The 
number is the same for every basis, and it tells how many “degrees of freedom” the vector 
space allows. For the space R”, the number is n. This is the “dimension” of R”. We now 
introduce the important word dimension for other spaces too. 


Definition The dimension of a vector space is the number of vectors in every basis. 


This matches our intuition. The line through v = (1,5, 2) has dimension one. It is a 
subspace with that one vector in its basis. Perpendicular to that line is the plane x + 5y + 
2z = 0. This plane has dimension 2. To prove it, we find a basis (—5, 1, 0) and (—2, 0, 1). 
The dimension is 2 because the basis contains two vectors. 

The plane is the nullspace of the matrix A =[1 5 2], which has two free variables. 
Our basis vectors (—5, 1, 0) and (—2, 0, 1) are the “special solutions” to Ax = 0. The next 
section studies other nullspaces, and here we emphasize only this: The basis is not unique 
(unless the rank is n and the basis is empty). But all bases contain the same number of 
vectors. 


Summary The key words of this section are “independence” and “span” and “basis” and 
“dimension.” The connections are clearest for independent columns: 


3J A matrix with full column rank has all these properties: 

1 The n columns are independent. 

2 The only solution to Ax = 0 is x = 0. 

3 Rank of the matrix = dimension of the column space = n. 


4 The columns are a basis for the column space. 
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The next chapter adds Property 5 to this list: The square matrix ATA is invertible. These 
are the only matrices which Chapter 4 allows. 


Note about the language of linear algebra We never say “rank of a space” or “dimension 
of a basis” or “basis of a matrix.” Those terms have no meaning. It is the dimension of the 
column space that equals the rank of the matrix. 


Bases for Matrix Spaces and Function Spaces 


The words “independence” and “basis” and “dimension” are not at all restricted to column 
vectors. We can ask whether matrices A1, A2, A3 are independent. We can find a basis for 
the solutions to d*y/dx? = y. That basis contains functions, maybe y = e* and y = e~*. 
Counting the basis functions gives the dimension 2. 

We think matrix spaces and function spaces are optional. Your class can go past this 
part—no problem. But in some way, you haven’t got the ideas straight until you can apply 


them to “vectors” other than column vectors. 


Matrix spaces The vector space M contains all 2 by 2 matrices. Its dimension is 4 and 


here is a basis: 
1 O 0 1 0 0 0 0 
Ar, Aas As Aa ={ Lo el l |: 


Those matrices are linearly independent. We are not looking at their columns, but at the 
whole matrix. Combinations of those four matrices can produce any matrix in M, so they 
span the space: 


cyAy +c2A2 + 03A3 + c4A4 = be A ; 
C3 C4 


This is zero only if the c’s are all zero—which proves independence. 

The matrices A1, A2, A4 are a basis for a subspace—the upper triangular matrices. 
Its dimension is 3. A; and A4 are a basis for a two-dimensional subspace—the diagonal 
matrices. What is a basis for the symmetric matrices? Keep A; and A4, and throw in 
A2 + A3. 

To push this further, think about the space of all n by n matrices. For a basis, choose 


matrices that have only a single nonzero entry (that entry is 1). There are n? positions for 
that 1, so there are n? basis matrices: 


The dimension of the whole matrix space is n?. 
The dimension of the subspace of upper triangular matrices is jn? + in. 


The dimension of the subspace of diagonal matrices is n. 


The dimension of the subspace of symmetric matrices is in? + n. 
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Function spaces d*y/dx* = 0 and d*y/dx? + y = 0 and d*y/dx* — y = 0 involve 
the second derivative. In calculus we solve these equations to find the functions y(x): 


y’=0 _ issolved by any linear function y = cx + d 
y =-y is solved by any combination y = csinx + d cosx 


y =y is solved by any combination y = ce* + de”. 


The second solution space has two basis functions: sinx and cosx. The third solution 
space has basis functions e* and e`*. The first space has x and 1. It is the “nullspace” of 
the second derivative! The dimension is 2 in each case (these are second-order equations). 

What about y” = 1? Its solutions do not form a subspace—there is a nonzero right 
side b = 1. A particular solution is y(x) = 5x. The complete solution is y(x) = 
jx? + cx +d. All those functions satisfy y” = 1. Notice the particular solution plus any 
function cx + d in the nullspace. 

A linear differential equation is like a linear matrix equation Ax = b. But we solve 
it by calculus instead of linear algebra. The particular solution is a function, the special 
solutions are functions (most often they are y = e“*). 


We end here with the space Z that contains only the zero vector. The dimension of this 
space is zero. The empty set (containing no vectors) is a basis. We can never allow the 
zero vector into a basis, because then a combination of the vectors gives zero—and linear 
independence is lost. 


Problem Set 3.4 


Questions 1-10 are about linear independence and linear dependence. 


1 Show that v1, v2, v3 are independent but v1, v2, v3, v4 are dependent: 
1 1 2 
vy; = | O v= | 1 v3 = | 1 vw4= | 3 
0 0 1 4 


Solve either cj v; + c2v2 + c3v3 = 0 or Ax = 0. The v’s go in the columns of A. 


2 (Recommended) Find the largest possible number of independent vectors among 


1 1 1 0 0 0 
—] 
vı = v = 9 v3 = v4 = ; Vvs = ; v6 = , 
0 —| 0 —] 0 1 
0 0 —] 0 —] -1 


3 Prove that if a = 0 or d = 0 or f = 0 (3 cases), the columns of U are dependent: 


a b c 
U=!10 d e 
0 0 f 


10 
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If a,d, f in Question 3 are all nonzero, show that the only solution to Ux = 0 is 
x = 0. Then U has independent columns. 


Decide the dependence or independence of 
(a) the vectors (1, 3, 2) and (2, 1, 3) and (3, 2, 1) 
(b) the vectors (1, —3, 2) and (2, 1, —3) and (—3, 2, 1). 


Choose three independent columns of U. Then make two other choices. Do the same 
for A. 


2 3 4 1 
0 6 7 O 
V=10 009 
0 0 0 0 


If w1, w2, w3 are independent vectors, show that the differences vj = w2 — w3 and 
v = w — w3 and v3 = w — w2 are dependent. Find a combination of the v’s that 
gives Zero. 


If w1, w2, w3 are independent vectors, show that the sums v; = w2 + w3, v = 
wı + w3, and v3 = w + Wp? are independent. (Write c1v1 + c2v2 + c303 = Oin 
terms of the w’s. Find and solve equations for the c’s.) 


Suppose v1, V2, v3, v4 are vectors in R?. 


(a) These four vectors are dependent because 
(b) The two vectors vı and v2 will be dependent if 


(c) The vectors vı and (0, 0, 0) are dependent because 


Find two independent vectors on the plane x +2y —3z—t = 0 in R4. Then find three 
independent vectors. Why not four? This plane is the nullspace of what matrix? 


Questions 11-15 are about the space spanned by a set of vectors. Take all linear 
combinations of the vectors. 


11 


12 


Describe the subspace of R? (is it a line or plane or R3?) spanned by 


(a) the two vectors (1, 1, —1) and (—1, —1, 1) 

(b) the three vectors (0, 1, 1) and (1, 1, 0) and (0, 0, 0) 

(c) the columns of a 3 by 5 echelon matrix with 2 pivots 

(d) all vectors with positive components. 

The vector b is in the subspace spanned by the columns of A when there is a solution 


to . The vector c is in the row space of A when there is a solution to 
True or false: If the zero vector is in the row space, the rows are dependent. 
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13 


14 


15 
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Find the dimensions of these 4 spaces. Which two of the spaces are the same? (a) col- 
umn space of A, (b) column space of U, (c) row space of A, (d) row space of U: 


l 1 1 1 O 
A=} 1 3 1 and U={0 2 1 
1 0 0 0 


Choose x = (x1, X2, X3, X4) in R4. It has 24 rearrangements like (x2, x1, x3, x4) and 
(x4, X3, x1, x2). Those 24 vectors, including x itself, span a subspace S. Find specific 
vectors x so that the dimension of S is: (a) zero, (b) one, (c) three, (d) four. 


v + w and v — w are combinations of v and w. Write v and w as combinations of 
v + w and v — w. The two pairs of vectors the same space. When are they a 
basis for the same space? 


Questions 16-26 are about the requirements for a basis. 


16 


17 


18 


19 


20 


21 


If v1, ..., Vn are linearly independent, the space they span has dimension 
These vectors are a for that space. If the vectors are the columns of an m by 


n matrix, then m is than n. 
Find a basis for each of these subspaces of R4: 


(a) All vectors whose components are equal. 

(b) All vectors whose components add to zero. 

(c) All vectors that are perpendicular to (1, 1, 0, 0) and (1, 0, 1, 1). 
(d) The column space and the nullspace of U =[§9)9 4]. 


Find three different bases for the column space of U above. Then find two different 
bases for the row space of U. 


Suppose v1, v2,..., V6 are six vectors in R‘. 


(a) Those vectors (do)(do not)(might not) span R4. 

(b) Those vectors (are)(are not)(might be) linearly independent. 

(c) Any four of those vectors (are)(are not)(might be) a basis for R4. 

The columns of A are n vectors from R”. If they are linearly independent, what is 


the rank of A? If they span R”, what is the rank? If they are a basis for R”, what 
then? 


Find a basis for the plane x —2y +3z = 0 in R?. Then find a basis for the intersection 
of that plane with the xy plane. Then find a basis for all vectors perpendicular to the 
plane. 


22 


23 


24 


25 


26 
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Suppose the columns of a 5 by 5 matrix A are a basis for R’. 


(a) The equation Ax = 0 has only the solution x = 0 because 


(b) Ifbisin RŽ then Ax = b is solvable because 


Conclusion: A is invertible. Its rank is 5. 
Suppose S is a 5-dimensional subspace of R. True or false: 


(a) Every basis for S can be extended to a basis for R6 by adding one more vector. 


(b) Every basis for RÎ can be reduced to a basis for S by removing one vector. 


U comes from A by subtracting row 1 from row 3: 
l 3 2 1 3 2 
A=/;0 1 1] and U=]|0 1 1 
[. 23. 32 0 0 0 
Find bases for the two column spaces. Find bases for the two row spaces. Find bases 
for the two nullspaces. 


True or false (give a good reason): 


(a) If the columns of a matrix are dependent, so are the rows. 
(b) The column space of a 2 by 2 matrix is the same as its row space. 
(c) The column space of a 2 by 2 matrix has the same dimension as its row space. 


(d) The columns of a matrix are a basis for the column space. 


For which numbers c and d do these matrices have rank 2? 


12505 F 
A=l0 0c 22| and ge T 
000 d2 


Questions 27-32 are about spaces where the “vectors” are matrices. 


27 


28 


29 


Find a basis for each of these subspaces of 3 by 3 matrices: 


(a) All diagonal matrices. 
(b) All symmetric matrices (AT = A). 


(c) All skew-symmetric matrices (AT = —A). 
Construct six linearly independent 3 by 3 echelon matrices U1, ..., U6. 


Find a basis for the space of all 2 by 3 matrices whose columns add to zero. Find a 
basis for the subspace whose rows also add to zero. 
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30 


31 


32 
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Show that the six 3 by 3 permutation matrices (Section 2.6) are linearly dependent. 
What subspace of 3 by 3 matrices is spanned by 


(a) all invertible matrices? 
(b) all echelon matrices? 


(c) the identity matrix? 


Find a basis for the space of 2 by 3 matrices whose nullspace contains (2, 1, 1). 


Questions 33-37 are about spaces where the “vectors” are functions. 


33 


34 


35 


36 


37 


38 


(a) Find all functions that satisfy a =='(). 

(b) Choose a particular function that satisfies ay = 3; 

(c) Find all functions that satisfy ay = 3, 

The cosine space F3 contains all combinations y(x) = Acosx+Bcos2x+C cos 3x. 
Find a basis for the subspace with y(0) = 0. 


Find a basis for the space of functions that satisfy 
(@) F-24y=0 b) #-#= 


Suppose yı (x), y2(x), y3(x) are three different functions of x. The vector space they 
span could have dimension 1, 2, or 3. Give an example y1, y2, y3 to show each 
possibility. 


Find a basis for the space of polynomials p(x) of degree < 3. Find a basis for the 
subspace with p(1) = 0. 


Find a basis for the space S of vectors (a, b, c,d) witha +c + d = 0 and also for 
the space T with a + b = 0 and c = 2d. What is the dimension of the intersection 
SAT? 


3.5 Dimensions of the Four Subspaces 


The main theorem in this chapter connects rank and dimension. The rank of a matrix is 
the number of pivots. The dimension of a subspace is the number of vectors in a basis. We 
count pivots or we count basis vectors. The rank of A reveals the dimensions of all four 
fundamental subspaces. Here are the subspaces: 


1 


2 


3 


The row space is R(A'), a subspace of R”. 
The column space is R(A), a subspace of R”. 
The nullspace is N(A), a subspace of R”. 


The left nullspace is N(A'), a subspace of R”. This is our new space. 
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In this book the column space and nullspace came first. We know R(A) and N(A) pretty 
well. The other two subspaces were barely mentioned—now they come forward. For the 
row space we take all combinations of the rows. This is also the column space of A‘. For 
the left nullspace we solve A! y = 0—that system is n by m. The vectors y go on the left 
side of A when the equation is written as y'A = 0!” 

The matrices A and A! are usually very different. So are their column spaces and so 
are their nullspaces. But those spaces are connected in an absolutely beautiful way. 

Part 1 of the Fundamental Theorem finds the dimensions of the four subspaces. One 
fact will stand out: The row space and column space have the same dimension. That di- 
mension is r (the rank of the matrix). The other important fact involves the two nullspaces: 
Their dimensions are n — r and m — r, to make up the full dimensions n and m. 

Part 2 of the Theorem will describe how the four subspaces fit together (two in R” 
and two in R”). That completes the “right way” to understand Ax = b. Stay with it—you 
are doing real mathematics. 


The Four Subspaces for U 


Suppose we have an echelon matrix. It is upper triangular, with pivots in a staircase pat- 
tern. We call it U, not A, to emphasize this special form. Because of that form, the four 
subspaces are easy to identify. We will find a basis for each subspace and check its dimen- 
sion. Then for any other matrix A, we watch how the subspaces change (or don’t change) 
as elimination takes us to U. 

As a specific 3 by 5 example, look at the four subspaces for the echelon matrix U: 


=3 1 3 579 pivot rows 1 and 2 
n=5 0 0 0 4 8 
2 0 0 0 0 0 pivot columns 1 and 4 


The pivots are 1 and 4. The rank is r = 2 (two pivots). Take the subspaces in order: 


1 The row space of U has dimension 2, matching the rank. Reason: The first two rows 
are a basis. The row space contains combinations of all three rows, but the third row (the 
zero row) adds nothing new. So rows 1 and 2 span the row space. 

Rows | and 2 are also independent. Certainly (1, 3, 5, 7, 9) and (0, 0, 0, 4, 8) are not 
parallel. But we give a proof of independence which applies to the nonzero rows of any 
echelon matrix. Start the proof with a combination of rows 1 and 2: 


cı (row 1) + c2 (row 2) = (c1, 3c}, 5c1, 7c1 + 4c2, 9c, + 8c2). 


The key is this: Suppose this combination equals (0, 0, 0, 0, 0). You have to show that both 
c’s must be zero. Look at the first component; it gives c} = 0. With cı gone, look at the 
fourth component; it forces c2 = 0. 

If there were r nonzero rows, we would start with any combination. We would look 
at the first pivot position. We would discover that cı = 0. Then cz = 0 and c3 = 0. All 
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the c’s are forced to be zero, which means that the rows are independent. Conclusion: The 
r pivot rows are a basis. 


The dimension of the row space is r. The nonzero rows of U form a basis. 


2 The column space of U also has dimension r = 2. Reason: The columns with pivots 
form a basis. In the example, the pivot columns 1 and 4 are independent. For a proof 
that applies to any echelon matrix, start the same way as for the rows. Suppose cı times 
the first column (1, 0, 0) plus cz times the fourth column (7, 4, 0) gives the zero vector: 
(cy + 7c2, 4c2,0) = (0,0,0). This time we work backwards: The second component 
4c = 0 forces c2 to be zero. Then the first component forces cı to be zero. This is nothing 
but back substitution, and all the c’s come out zero. 
So the pivot columns are independent: 


The dimension of the column space is r. The pivot columns form a basis. 


Those pivot columns produce any b in the column space. Set all the free variables to zero. 
Then back substitute in Ux = b, to find the pivot variables x; and x4. This makes b a 
combination of columns 1 and 4. The pivot columns span the column space. 

You can see directly that column 2 is 3(column 1). Also column 3 is 5(column 1). 
Note for future reference: Column 5 equals 2(column 4)—5(column 1). 


3 The nullspace of this U has dimension n — r = 5 — 2. There aren —r = 3 free 
variables. Here x2, x3, x5 are free (no pivots in those columns). Those 3 free variables 
yield 3 special solutions to Ux = 0. Set a free variable to 1, and solve for x; and x4: 


—3 —5 5 
1 0 0 Ux = 0 has the 
S2 = 0 s3 = 1 s5 = 0 complete solution 
0 0 —2 X = X282 + X383 + X585. 
0 0 1 


There is a special solution for each free variable. With n variables and r pivot variables, 
the count is easy: 


The nullspace has dimension n — r. The special solutions form a basis. 


We have to remark: Those solutions tell again how to write columns 2, 3, 5 as com- 
binations of 1 and 4. The special solution (—3, 1, 0, 0, 0) says that column 2 equals 3(col- 
umn 1). 

The serious step is to prove that we have a basis. First, the special solutions are 
independent: Any combination x282 +x38s3 +x5s5 has the numbers x2, x3, x5 in components 
2, 3, 5. So this combination gives the zero vector only if x2 = x3 = x5 = 0. The special 
solutions span the nullspace: If Ux equals zero then x = x2s2 + x353 + x555. In this 
equation, the components x2, x3, x5 are the same on both sides. The pivot variables x; and 
x4 must also agree—they are totally determined by Ux = 0. This proves what we strongly 
believed—by combining the special solutions we get all solutions. 
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4 The left nullspace of this U has dimension m — r = 3 — 2. Reason: UTy = 0 
has one free variable. There is a line of solutions y = (0,0, y3). This left nullspace is 
1-dimensional. 

U' y is a combination of the rows of U, just as Ux is a combination of the columns. 
So UTy = 0 means: 


yı times row 1 plus y2 times row 2 plus y3 times row 3 equals the zero row. 


Conclusion: y3 can be any number, because row 3 of U is all zeros. But yj = y2 = 0 
(because rows | and 2 are independent). The special solution is y = (0, 0, 1). 

An echelon matrix U has m — r zero rows. We are solving UT y = 0. The last m — r 
components of y are free: 


yt 0 The left nullspace has dimension m — r. 
yu: 
The first r components of y are zero. 
The first r rows of U are independent. The other rows are zero. To produce a zero combi- 
nation, y must start with r zeros. This leaves dimension m — r. 
Why is this a “left nullspace”? The reason is that UTy = 0 can be transposed to 
yTU =0!. Now y! is a row vector to the left of U. This subspace came fourth, and some 
linear algebra books omit it—but that misses the beauty of the whole subject. 


In R” the row space and nullspace have dimensions r and n — r (adding to n). 


In R” the column space and left nullspace have dimensions r and m — r (total m). 


So far this is proved for echelon matrices. Figure 3.5 shows the same for A. 


The Four Subspaces for A 


We have a small job still to do. The dimensions for A are the same as for U. The job is 
to explain why. Suppose for example that all multipliers in elimination are 1. Then A is L 
times U: 


0 0 LS SLP l 3 5 7 9 
LU=|1 10110 0 0 4 8J=}]1 #3 5 11 I7) =A. 
1 1 00000 E 3 S M. T 


This matrix A still has m = 3 and n = 5 and especially r = 2. We can go quickly through 
its four subspaces, finding a basis and checking the dimension. The reasoning doesn’t 
depend on this particular L or U. 


1 A has the same row space as U. Same dimension r and same basis. 


Reason: Every row of A is a combination of the rows of U. Also every row of U is a 
combination of the rows of A. In one direction the combinations are given by L, in the 
other direction by L~!. Elimination changes the rows of A to the rows of U, but the row 
spaces are identical. 
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R (A) 
dim r 


column 
space 
all Ax 


left 
nullspace 
Aly=0 


nullspace 
Ax=0 


N (A!) 
dimension m—r 


N (A) 
dimension n—r 


Figure 3.5 The dimensions of the four fundamental subspaces (for U and for A). 


Since A has the same row space as U, we can choose the first r rows of U as a basis. 
Or we could choose r suitable rows of the original A. They might not always be the first 
r rows of A, because those could be dependent. (Then there will be row exchanges. The 
exchanges are made by P.) The first r rows of P A do form a basis for the row space. 


2 The column space of A has dimension r. For every matrix this fact is essential: 
The number of independent columns equals the number of independent rows. 


Wrong reason: “A and U have the same column space.” This is false—look at A and U in 
the example above. The columns of U end in zeros. The columns of A don’t end in Zeros. 
The column spaces are different, but their dimensions are the same—equal to r. 


Right reason: A combination of the columns of A is zero exactly when the same combina- 
tion of the columns of U is zero. Say that another way: Ax = 0 exactly when Ux = 0. 
Columns 1 and 4 were independent in U, so columns 1 and 4 are independent in A. Look 
at the matrix. Columns 1 and 2 were dependent in U so they are dependent in A. 


Conclusion The r pivot columns in U are a basis for its column space. So the r corre- 
sponding columns of A are a basis for its column space. 
3 A has the same nullspace as U. Same dimension n — r and same basis as U. 


Reason: Ax = 0 exactly when Ux = 0. The elimination steps don’t change the solutions. 
The special solutions are a basis for this nullspace. There are n — r free variables, so the 
dimension is n — r. Notice that r + (n — r) equals n: 


(dimension of column space) + (dimension of nullspace) = dimension of R”. 
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4 The left nullspace of A (the nullspace of AT) has dimension m — r. 


Reason: A! is just as good a matrix as A. When we know the dimensions for every A, we 
also know them for A!. Its column space was proved to have dimension r. Since A! is n 
by m, the “whole space” is now R”. The counting rule for A was r + (n — r) =n. The 
counting rule for AT is r + (m — r) = m. So the nullspace of A! has dimension m — r. 
We now have all details of the main theorem: 


Fundamental Theorem of Linear Algebra, Part 1 


The column space and row space both have dimension r. 
The nullspaces have dimensions n — r and m —r. 


By concentrating on spaces of vectors, not on individual numbers or vectors, we get these 
clean rules. You will soon take them for granted—eventually they begin to look obvious. 
But if you write down an 11 by 17 matrix with 187 nonzero entries, we don’t think most 
people would see why these facts are true: 


dimension of R(A) = dimension of R(A!) 


dimension of R(A) + dimension of N(A) = 17. 


Example 3.20 A=[1 2 3] has m=1 and n=3 andrank r = 1. 


The row space is a line in RÌ. The nullspace is the plane Ax = x, + 2x2 + 3x3 = 0. This 
plane has dimension 2 (which is 3 — 1). The dimensions add to 1 + 2 = 3. 

The columns of this 1 by 3 matrix are in R!! The column space is all of R!. The left 
nullspace contains only the zero vector. The only solution to Aly = Ois y = 0. That is 
the only combination of the row that gives the zero row. Thus N(A‘) is Z, the zero space 
with dimension 0 (which is m — r). In R” the dimensions add to 1 +0 = 1. 


I de 23 


Example 3.21 A = > 3 


| has m = 2 with n =3 and r= 1. 
The row space is the same line through (1, 2,3). The nullspace is the same plane x; + 
2x2 + 3x3 = 0. The dimensions still add to 1 + 2 = 3. 

The columns are multiples of the first column (1, 1). But there is more than the 
zero vector in the left nullspace. The first row minus the second row is the zero row. 
Therefore A! y = 0 has the solution y = (1, —1). The column space and left nullspace are 
perpendicular lines in R*. Their dimensions are 1 and 1, adding to 2: 


column space = line through H left nullspace = line through È | : 


If A has three equal rows, its rank is . What are two of the y’s in its left nullspace? 
The y’s combine the rows to give the zero row. 
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Matrices of Rank One 


That last example had rank r = 1—and rank one matrices are special. We can describe 
them all. You will see again that dimension of row space = dimension of column space. 
When r = 1, every row is a multiple of the same row: 


i «2: 3 1 
2 4 6 2 . 

A= 3 6 9 equals 4 times [ 1 2 3 |. 
0 O 0 0 


A column times a row (4 by 1 times 1 by 3) produces a matrix (4 by 3). All rows are 
multiples of the row (1, 2, 3). All columns are multiples of the column (1, 2, —3, 0). The 
row space is a line in R”, and the column space is a line in R”. 


T 


Every rank one matrix has the special form A = uv’ = column times row. 


The columns are multiples of u. The rows are multiples of vT. The nullspace is the plane 
perpendicular to v. (Ax = 0 means that u(v'x) = 0 and then v'x = 0.) It is this 
perpendicularity of the subspaces that will be Part 2 of the Fundamental Theorem. 


We end this section with a 4 by 3 example. Its rank is 2. Its echelon form is U: 


A= 


O O O Ww 


I 
3 
0 
0 


oo O | 


1 1 3 
2 5 6 
36 9 
0 3 0 
Rows 1 and 2 of A or U give a basis for the (same) row space. Columns 1 and 2 give 
bases for the (different) column spaces. The special solution s = (—3, 0, 1) is a basis for 
the nullspace of A and U. The vectors (0, 0, 1, 0) and (0, 0, 0, 1) satisfy U'y = 0. The 
vectors y = (1, 1, —1, 0) and (—2, 1, 0, 1) satisfy Aly =Q. 


The nullspace has dimension 3 — 2 = 1. Both left nullspaces have dimension 
ee 


Problem Set 3.5 


1 (a) Ifa 7 by 9 matrix has rank 5, what are the dimensions of the four subspaces? 


(b) Ifa3 by 4 matrix has rank 3, what are its column space and left nullspace? 


2 Find bases for the four subspaces associated with A and B: 


124 124 
a=); al a B=|) ar 
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Find a basis for each of the four subspaces associated with 


O12 3 4 1 O 0;/0 12 3 4 
A=|]0 1 2 4 6;=]1 1 0|/|O 0 0 1 2 
0 0 0 1 2 O 1 1 00000 


Construct a matrix with the required property or explain why no such matrix exists: 


; 1 0 . 
(a) Column space contains Hi [o], row space contains | 4 ], [4 ]. 


3 
(b) Column space has basis |i i nullspace has basis | l | 
(c) Dimension of nullspace = 1 + dimension of left nullspace. 
(d) Left nullspace contains | } ], row space contains [ ? ]. 


(e) Row space = column space, nullspace Æ left nullspace. 


If V is the subspace spanned by (1, 1, 1) and (2, 1, 0), find a matrix A that has V as 
its row space and a matrix B that has V as its nullspace. 


Without elimination, find dimensions and bases for the four subspaces for 
1 

0 and B= | 4 

l 5 


Suppose the 3 by 3 matrix A is invertible. Write down bases for the four subspaces 
for A, and also for the 3 by 6 matrix B=[A A]. 


What are the dimensions of the four subspaces for A, B, and C, if J is the 3 by 3 
identity matrix and 0 is the 3 by 2 zero matrix? 


Which subspaces are the same for these matrices of different sizes? 


A A A A 
(a) [A] and a (b) H and P 7 


Prove that all three matrices have the same rank r. 


If the entries of a 3 by 3 matrix are chosen randomly between 0 and 1, what are the 
most likely dimensions of the four subspaces? What if the matrix is 3 by 5? 


(Important) A is an m by n matrix of rank r. Suppose there are right sides b for 
which Ax = b has no solution. 


(a) What are all inequalities (< or <) that must be true between m, n, and r? 


(b) How do you know that Al y = 0 has solutions other than y = 0? 
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Construct a matrix with (1,0, 1) and (1, 2,0) as a basis for its row space and its 
column space. Why can’t this be a basis for the row space and nullspace? 


True or false: 


(a) Ifm =n then the row space of A equals the column space. 
(b) The matrices A and —A share the same four subspaces. . 


(c) If A and B share the same four subspaces then A is a multiple of B. 
Without computing A, find bases for the four fundamental subspaces: 
1 0 0 12 3 4 
A=|6 1 O}]]0 1 2 3 
9 8 1 00 1 2 


If you exchange the first two rows of A, which of the four subspaces stay the same? 
If v = (1, 2, 3, 4) is in the column space of A, write down a vector in the column 
space of the new matrix. 


Explain why v = (1, 2, 3) cannot be a row of A and also be in the nullspace of A. 
Describe the four subspaces of RÌ associated with 

0 1 0 1 1 O 

A=1!0 0 1 and J/+A=/0 1 1 

0 0 0 0 0 1 


(Left nullspace) Add the extra column b and reduce A to echelon form: 


Li 2 3 b 1 2 3 Di 
[A b]=]|4 5 6 b| > |0 -3 -6 b-4bi 
7 8 9 b} 0 0 0 b3-2b,+bı 


A combination of the rows of A has produced the zero row. What combination is it? 
(Look at b3 — 2b2 + bı on the right side.) Which vectors are in the nullspace of AT 
and which are in the nullspace of A? 


Following the method of Problem 18, reduce A to echelon form and look at zero 
rows. The b column tells which combinations you have taken of the rows: 


12 bd TE 
a |3 4 b (b) 2 
4 6 b 2 4 b3 

2 2 5 by 


From the b column after elimination, read off vectors in the left nullspace of A 
(combinations of rows that give zero rows). Check that you have m — r basis vectors 
for N(A?). 
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(a) Describe all solutions to Ax = Oif 
1 0 0]]j4 2 0 1 
A=/2 1 0/;]/0 0 1 3 
3 4 1j 0000 


(b) How many independent solutions are there to AT y = 0? 


(c) Give a basis for the column space of A. 


Suppose A is the sum of two matrices of rank one: A = uv! + w2!. 


(a) Which vectors span the column space of A? 

(b) Which vectors span the row space of A? 

(c) Therankislessthan2if si orif __— 

(d) Compute A and its rank if u = z = (1, 0, 0) and v = w = (0, 0, 1). 


Construct a matrix whose column space has basis (1, 2, 4), (2, 2, 1) and whose row 
space has basis (1, 0, 0), (0, 1, 1). 


Without multiplying matrices, find bases for the row and column spaces of A: 


ae 
TE H 
2I 


How do you know from these shapes that A is not invertible? 


ATy = d is solvable when the right side d is in which subspace? The solution is 
unique when the contains only the zero vector. 


True or false (with a reason or a counterexample): 


(a) A and A! have the same number of pivots. 

(b) A and A! have the same left nullspace. 

(c) If the row space equals the column space then AT =A. 

(d) If A! = —A then the row space of A equals the column space. 

If AB = C, the rows of C are combinations of the rows of . So the rank of C 


is not greater than the rank of . Since B! AT = CT, the rank of C is also not 
greater than the rank of 


If a, b, c are given with a Æ 0, how would you choose d so that A = k A has rank 
one? Find a basis for the row space and nullspace. 
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28 Find the ranks of the 8 by 8 checkerboard matrix B and chess matrix C: 


1 O 1 O 1 0 1 


0 r n bq kbnr 

O 1 O 1 O 1 0 1 p Pp p p p p Pp p 
B=ą|1 01010 1 0 and C= four zero rows 

: i i 5 p Pp P Pp Pp P PP Pp 

010101 0 1 r n b-q k baer 


The numbers r, n, b, q, k, p are all different. Find bases for the row space and left 
nullspace of B and C. Challenge problem: Find a basis for the nullspace of C. 


ORTHOGONALITY 


4.1 Orthogonality of the Four Subspaces 


Vectors are orthogonal when their dot product is zero: v- w = 0 or v'w = 0. This chapter 


moves up a level, from orthogonal vectors to orthogonal subspaces. Orthogonal means 
the same as perpendicular. 

Subspaces entered Chapter 3 with a specific purpose—to throw light on Ax = b. 
Right away we needed the column space (for b) and the nullspace (for x). Then the light 
turned onto AT, uncovering two more subspaces. Those four fundamental subspaces reveal 
what a matrix really does. 

A matrix multiplies a vector: A times x. At the first level this is only numbers. At 
the second level Ax is a combination of column vectors. The third level shows subspaces. 
But we don’t think you have seen the whole picture until you study Figure 4.1. It fits the 
subspaces together, to show the hidden reality of A times x. The right angles between 
subspaces are something new—and we have to say what they mean. 

The row space is perpendicular to the nullspace. Every row of A is perpendicular 
to every solution of Ax = 0. Similarly every column is perpendicular to every solution of 
Al y = 0. That gives the 90° angle on the right side of the figure. This perpendicularity of 
subspaces is Part 2 of the Fundamental Theorem of Linear Algebra. 

May we add a word about the left nullspace? It is never reached by Ax, so it might 
seem useless. But when b is outside the column space—when we want to solve Ax = b 
and can’t do it—then this nullspace of AT comes into its own. It contains the error in the 
“least-squares” solution. That is the key application of linear algebra in this chapter. 

Part 1 of the Fundamental Theorem gave the dimensions of the subspaces. The 
row and column spaces have the same dimension r (they are drawn the same size). The 
nullspaces have the remaining dimensions n — r and m — r. Now we will show that the 
row space and nullspace are actually perpendicular. 


Definition Two subspaces V and W of a vector space are orthogonal if every vector v 
in V is perpendicular to every vector w in W: 


v-w=0 or v'w=0 forall vin V and all w in W. 
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dimension 
=r 


dimension 
=r 


column 
space 
of A 


Ax=b 


nullspace 
of AT 


nullspace 
of A 


dimension 
=m—r 


dimension 
= n-r 


Figure 4.1 Two pairs of orthogonal subspaces. Dimensions add to n and m. 


Example 4.1 The floor of your room (extended to infinity) is a subspace V. The line 
where two walls meet is a subspace W (one-dimensional). Those subspaces are orthogonal. 
Every vector up the meeting line is perpendicular to every vector in the floor. The origin 
(0, 0, 0) is in the corner. We assume you don’t live in a tent. 


Example 4.2 Suppose V is still the floor but W is one of the walls (a two-dimensional 
space). The wall and floor look orthogonal but they are not! You can find vectors in V 
and W that are not perpendicular. In fact a vector running along the bottom of the wall is 
also in the floor. This vector is in both V and W—and it is not perpendicular to itself. 


Example 4.3 Ifa vector is in two orthogonal subspaces, it must be perpendicular to itself. 
It is v and it is w, so v'v = 0. This has to be the zero vector. Zero is the only point where 
the nullspace meets the row space. 


The crucial examples for linear algebra come from the fundamental subspaces. 


4A Every vector x in the nullspace of A is perpendicular to every row of A. The nullspace 
and row space are orthogonal subspaces. 


To see why x is perpendicular to the rows, look at Ax = 0. Each row multiplies x: 
row 1 0 
Ax = : x}=]:]. (4.1) 


row m 0 


The first equation says that row 1 is perpendicular to x. The last equation says that row m 
is perpendicular to x. Every row has a zero dot product with x. Then x is perpendicular 
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to every combination of the rows. The whole row space R(A‘) is orthogonal to the whole 
nullspace N(A). 


Here is a second proof of that orthogonality for readers who like matrix shorthand. The 
vectors in the row space are combinations A! y of the rows. Take the dot product of AT y 
with any x in the nullspace. These vectors are perpendicular: 


xl(Aly) = (Ax)'y = 01y =0. (4.2) 


We like the first proof. You can see those rows of A multiplying x to produce zeros 
in equation (4.1). The second proof shows why A and A! are both in the Fundamental 
Theorem. A! goes with y and A goes with x. At the end we used Ax = 0. 

In this next example, the rows are perpendicular to (1, 1, —1) in the nullspace: 


143=4=0 


5+2-—7=0 


1 
1 3 4 0 ; 
Ax = | 5 2 4 = | J gives the dot products 


Now we turn to the other two subspaces. They are also orthogonal, but in R”. 


4B Every vector y in the nullspace of AT is perpendicular to every column of A. The left 
nullspace and column space are orthogonal. 


Apply the original proof to A’. Its nullspace is orthogonal to its row space—which is the 
column space of A. Q.E.D. For a visual proof, look at y'A = 0. The row vector y! 
multiplies each column of A: 


C C 
O O 

yA=[ əx Ji 1 |=[0 ... 0] (4.3) 
1 n 


The dot product with every column is zero. Then y is perpendicular to each column—and 
to the whole column space. 


Very important The fundamental subspaces are more than just orthogonal (in pairs). 
Their dimensions are also right. Two lines could be perpendicular in 3-dimensional space, 
but they could not be the row space and nullspace of a matrix. The lines have dimensions 1 
and 1, adding to 2. The correct dimensions r and n — r must add to n = 3. Our subspaces 
have dimensions 2 and 1, or 3 and 0. The fundamental subspaces are not only orthogonal, 
they are orthogonal complements. 


Definition The orthogonal complement of a subspace V contains every vector that is 
perpendicular to V. This orthogonal subspace is denoted by V+ (pronounced “V perp”). 


By this definition, the nullspace is the orthogonal complement of the row space. 
Every x that is perpendicular to the rows satisfies Ax = 0, and is included in the nullspace. 
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Figure 4.2 The true action of A times x: row space to column space, nullspace to zero. 


The reverse is also true (automatically). If v is orthogonal to the nullspace, it must 
be in the row space. Otherwise we could add this v as an extra row of the matrix, without 
changing the nullspace. The row space and its dimension would grow, which breaks the 
law r + (n — r) =n. We conclude that N(A)* is exactly the row space R(A!). 

The left nullspace and column space are not only orthogonal in R”, they are orthog- 
onal complements. The 90° angles are marked in Figure 4.2. Their dimensions add to the 
full dimension m. 


Fundamental Theorem of Linear Algebra, Part 2 


The nullspace is the orthogonal complement of the row space (in R”). 
The left nullspace is the orthogonal complement of the column space (in R™). 


Part 1 gave the dimensions of the subspaces, Part 2 gives their orientation. They are 
perpendicular (in pairs). The point of “complements” is that every x can be split into a 
row Space component x, and a nullspace component xn. When A multiplies x = x, + Xp, 
Figure 4.2 shows what happens: 


The nullspace component goes to zero: Ax, = 0. 


The row space component goes to the column space: Ax; = Ax. 


Every vector goes to the column space! Multiplying by A cannot do anything else. But 
more than that: Every vector in the column space comes from one and only one vector x, 
in the row space. Proof: If Ax, = Ax), the difference x, — x, is in the nullspace. It is 
also in the row space, where x, and x, came from. This difference must be the zero vector, 
because the spaces are perpendicular. Therefore x, = x/.. 
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There is an invertible matrix hiding inside A, if we throw away the two nullspaces. 
From the row space to the column space, A is invertible. The “pseudoinverse” will invert 
jt in Section 7.2. 


Example 4.4 Every diagonal matrix has an r by r invertible submatrix: 


3 0 0 0 O 3 0 
A=|]0 5 0 0 O contains i a 
00000 
The rank is r = 2. The 2 by 2 submatrix in the upper corner is certainly invertible. The 
other eleven zeros are responsible for the nullspaces. 


Section 7.2 will show how every A becomes a diagonal matrix, when we choose the right 
bases for R” and R”. This Singular Value Decomposition is a part of the theory that has 
become extremely important in applications. 


Combining Bases from Subspaces 


What follows are some valuable facts about bases. They could have come earlier, when 
bases were defined, but they were saved until now—when we are ready to use them. After a 
week you have a clearer sense of what a basis is (independent vectors that span the space). 
When the count is right, one of those two properties implies the other: 


4C Any n linearly independent vectors in R” must span R”. Any n vectors that span R” 
must be independent. 


Normally we have to check both properties: First, that the vectors are linearly inde- 
pendent, and second, that they span the space. For n vectors in R”, either independence or 
spanning is enough by itself. Starting with the correct number of vectors, one property of 
a basis implies the other. 

This is true in any vector space, but we care most about R”. When the vectors go 
into the columns of an n by n matrix A, here are the same two facts. Remember that A is 
square: 


4D Ifthe n columns are independent, they must span all of R”. If the n columns span R”, 
they must be independent. 


A square system Ax = b always has one solution if it never has two solutions, and 
vice versa. Uniqueness implies existence and existence implies uniqueness. The square 
matrix A is invertible. Its columns are a basis for R”. 

Our standard method of proof is elimination. If there are no free variables (unique- 
ness), there must be n pivots. Then back substitution solves Ax = b (existence). In the 
opposite direction, suppose Ax = b can always be solved (existence of solutions). Then 
elimination produced no zero rows. There are n pivots and no free variables. The nullspace 
contains only x = 0 (uniqueness of solutions). 
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The count is always right for the row space of A and its nullspace. They have dimensions 
r andn — r. With a basis for the row space and a basis for the nullspace, we have r + 
(n — r) = n vectors—the right number. Those n vectors are independent.’ Therefore they 
span R”. They are a basis: 


Every x in R” is the sum x, + x, of a row space vector x, and a nullspace vector Xp. 


This confirms the splitting in Figure 4.2. It is the key point of orthogonal complements— 
the dimensions add to n and no vectors are missing. 


Example 4.5 For A =[7 I]=[)92°], write any vector x as x, + Xn. 
(1, 0, 1, 0) and (0, 1, 0, 1) are a basis for the row space. (1, 0, —1, 0) and (0, 1, 0, —1) are 


a basis for the nullspace. Those four vectors are a basis for R4. Any x = (a, b,c, d) can 
be split into x, + Xn: 


a 1 0 1 0 
b|) at+c]|0 b+d |1 p a—c| 0 b-—-d 1 
c| 2 |i 2 |0 2 |-1 2 0 
d 0 1 0 —1 
Problem Set 4.1 
Questions 1-10 grow out of Figures 4.1 and 4.2. 
1 Suppose A is a 2 by 3 matrix of rank one. Draw Figure 4.1 to match the sizes of the 


subspaces. 


2 Redraw Figure 4.2 for a 3 by 2 matrix of rank r = 2. What are the two parts x, 
and xn? 


3 Construct a matrix with the required property or say why that is impossible: 
i 1 2 1 
(a) Column space contains l | and E | nullspace contains H 


(b) Row space contains l 


1 2 f 1 
| and E | nullspace contains | l | 
(c) Column space is perpendicular to nullspace 

(d) Row 1 + row 2 + row 3 = 0, column space contains (1, 2, 3) 


(e) Columns add up to zero column, rows add to a row of 1’s. 
4 It is possible for the row space to contain the nullspace. Find an example. 
TIf a combination of the vectors gives x + Xn = 0, then x, = —x,y is in both subspaces. It is orthogonal to 


itself and must be zero. All coefficients of the row space basis and nullspace basis must be zero—which proves 
independence of the n vectors together. 


10 
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(a) If Ax = b has a solution and A! y = 0, then y is perpendicular to 


(b) If Ax = b has no solution and ATy = 0, explain why y is not perpendicular 
to 


In Figure 4.2, how do we know that Ax; is equal to Ax? How do we know that this 
vector is in the column space? 


If Ax is in the nullspace of A! then Ax must be . Why? Which other subspace 
is Ax in? This is important: A! A has the same nullspace as A. 


Suppose A is a symmetric matrix (A! = A). 


(a) Why is its column space perpendicular to its nullspace? 


(b) If Ax = 0 and Az = 5z, why is x perpendicular to z? These are “eigenvec- 
tors.” 


(Recommended) Draw Figure 4.2 to show each subspace correctly for 
J. 2 1 0 
a=|) | and B= |) A 


Find the pieces x, and x, and draw Figure 4.2 properly if 


1 -1 > 
A={0 0 and a. 
0 0 


Questions 11-19 are about orthogonal subspaces. 


11 


12 


13 


14 


Prove that every y in N(AT) is perpendicular to every Ax in the column space, using 
the matrix shorthand of equation (4.2). Start from A! y=0. 


The Fundamental Theorem is also stated in the form of Fredholm’s alternative: For 
any A and b, exactly one of these two problems has a solution: 


(a) Ax=b 
(bù) Aly=0 with bly 40. 


Either b is in the column space of A or else b is not orthogonal to the nullspace of AT. 
Choose A and b so that (a) has no solution. Find a solution to (b). 


If S is the subspace of R? containing only the zero vector, what is S+? If S is 
spanned by (1, 1, 1), what is S+? If S is spanned by (2, 0, 0) and (0, 0, 3), what 
is S+? 


Suppose § only contains two vectors (1, 5, 1) and (2, 2, 2) (not a subspace). Then 
S+ is the nullspace of the matrix A = . Therefore S+ is a even if S$ 
is not. 
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Suppose L is a one-dimensional subspace (a line) in R?. Its orthogonal comple- 
ment L+ isthe __ perpendicular to L. Then (L+)+ isa perpendicular to 
L~. In fact (L+)+ is the same as 


Suppose V is the whole space R*. Then V+ contains only the vector . Then 
(V+)+ contains _.So(V+)+ is the same as 
Suppose S$ is spanned by the vectors (1, 2, 2, 3) and (1, 3, 3, 2). Find two vectors 


that span S+. 


If P is the plane of vectors in R4 satisfying x1 + x2 + x3 + x4 = 0, write a basis 
for P+. Construct a matrix that has P as its nullspace. 


If a subspace S is contained in a subspace V, prove that the subspace S+ con- 
tains V+. 


Questions 20-25 are about perpendicular columns and rows. 


20 


21 


22 


23 


24 


25 


Suppose an n by n matrix is invertible: AA~! = J. Then the first column of A~! is 
orthogonal to the space spanned by : 


Suppose the columns of A are unit vectors, all mutually perpendicular. What is ATA? 


Construct a 3 by 3 matrix A with no zero entries whose columns are mutually per- 
pendicular. Compute ATA. Why is it a diagonal matrix? 


The lines 3x + y = bı and 6x + 2y = by are . They are the same line 
if . In that case (b1, b2) is perpendicular to the vector . The nullspace 
of the matrix is the line 3x + y = _ . One particular vector in that nullspace is 
Why is each of these statements false? 


(a) (1, 1, 1) is perpendicular to (1, 1, —2) so the planes x + y +z = 0 and x + y — 
2z = 0 are orthogonal subspaces. 


(b) The lines from (0, 0, 0) through (2,455) and (1, —3, 2) are orthogonal comple- 
ments. 


(c) Iftwo subspaces meet only in the zero vector, the subspaces are orthogonal. 
Find a matrix with v = (1, 2,3) in the row space and column space. Find another 


matrix with v in the nullspace and column space. Which pairs of subspaces can v 
not be in? 
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Figure 4.3 The projection of b onto a line and a plane. 


4.2 Projections 


May we Start this section with two questions? (In addition to that one.) The first question 
aims to show that projections are easy to visualize. The second question is about matrices: 


1 What are the projections of b = (2, 3, 4) onto the z axis and the xy plane? 
2 What matrices produce those projections? 


If b is projected onto a line, its projection p is the part of b along that line. When b is 
projected onto a plane, p is the part in that plane. 
There is a matrix P that multiplies b to give p. The projection is p = Pb. 


The picture in your mind should be Figure 4.3. For the first projection we go across to the 
z axis. For the second projection we drop straight down to the xy plane. One way gives 
pı = (0, 0, 4) and the other way gives p2 = (2, 3,0). Those are the parts of b = (2, 3, 4) 
along the line and in the plane. 

The matrices Pı and P2 are 3 by 3. They multiply b with 3 components to produce p 
with 3 components. Projection onto a line comes from a rank one matrix. Projection onto 
a plane comes from a rank two matrix: 


0 0 0 1 0 O 
Onto the z axis: P}; =|}0 0 0 Onto the xy plane: P2=|]0 1 0 
0 0 1 0 0 0 


Pı picks out the z component of every vector. P2 picks out the “in plane” component. To 
find pı and p2, multiply by Pı and P2 (small p for the vector, capital P for the matrix that 
produces it): 


0 0 0 x 0 1 0 O xX x 
pi=Pi}b=|0 0 O}] | y] =] 0 p2: = Pb=|0 1 0 yl=|y 
0 0 1 Z Z 0 00 Z 0 


In this case the two projections are perpendicular. The xy plane and z axis are orthogonal 
subspaces, like the floor of a room and the line between two walls. More than that, the 
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subspaces are orthogonal complements. Their dimensions add to 1 + 2 = 3—every vector 
in the whole space is the sum of its parts in the two subspaces. The projections pı and p2 
are exactly those parts: 


The vectors give pı + p2 =b. The matrices give Pı + P2 = I. (4.4) 


This is perfect. Our goal is reached—for this example. We have the same goal for any 
line and any plane and any n-dimensional subspace. The object is to find the part p in 
each subspace, and the projection matrix P that produces p = Pb. Every subspace of R” 
has an m by m (square) projection matrix. To compute P, we absolutely need a good 
description of the subspace. 

The best description is to have a basis. The basis vectors go into the columns of A. 
We are projecting onto the column space of A. Certainly the z axis is the column space 
of the following matrix A;. The xy plane is the column space of A2. That plane is also the 
column space of A3 (a subspace has many bases): 


0 1 0 1 2 
A; = | 0 and A»=]|0 1 and A3=|2 3 
| 0 0 0 0 


The problem is to project onto the column space of any m by n matrix A. Start with a line. 
Thenn = 1. 


Projection Onto a Line 


We are given a point b = (b1, ..., bm) in m-dimensional space. We are also given a line 
through the origin, in the direction of a = (a1, ..., am). We are looking along that line, 
to find the point p closest to b. The key is orthogonality: The line connecting b to p 
is perpendicular to the vector a. This is the dotted line in Figure 4.4—which we now 
compute by algebra. 

The projection p is some multiple of a (call it p = xa). Our first step is to compute 
this unknown number x. That will give the vector p. Then from the formula for p, we read 
off the projection matrix P. These three steps will lead to all projection matrices: find x, 
then find p, then find P. 

The dotted line b — p is b — xa. It is perpendicular to a—this will determine x. Use 
the fact that two vectors are perpendicular when their dot product is zero: 

~ a-b _ a'b 


a-(b—xa)=0 or a-b-—xa-a=0 or eee cs (4.5) 
a-a aa 

For vectors the multiplication a'b is the same as a-b. Using the transpose is better, because 

it applies also to matrices. (We will soon meet A'b.) Our formula for £ immediately gives 


the formula for p: 


4E The projection of b onto the line through a is the vector p = ŝa = (a'b/a'a)a. 


Special case 1: If b = a then x = 1. The projection of a onto a is a itself. 
Special case 2: If b is perpendicular to a then a'b = 0 and p = 0. 
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Figure 4.4 The projection of b onto a line has length || p|| = ||b|| cos 8. 


Example 4.6 Project b = H onto a = [2] to find p = xa in Figure 4.4. 


Solution The formula for x is ab = z The projection is p = 2a = (š, A y) The 


9 
error vector between b and ? ise = b — p. The error e = (3,—}, —§) should be 
perpendicular to a and it is: e -a =e `a = ï — ĝ — § = 

Look at the right ae of b, p, and e. The vector b is split into two parts—its 
component along the line is p, its perpendicular component is e. Those two sides of a right 
triangle have length ||b|| cos 6 and ||b|| sin. The trigonometry matches the dot product: 


T 
a b llall IIb] cos 8 
p = ——a soitslengthis ||pl = ———_,—_ 
a'a jal? 


The dot product is simpler than getting involved with cos 0 and the length of b. The exam- 


ple has square roots in cos 0 = 5/ 34/3 and ||b|| = v3. There are no square roots in the 


projection p = 2a. 


la || = ||B|| cos @. (4.6) 


Now comes the projection matrix. In the formula for p, what matrix is multiply- 
ing b? You can see it better if the number £ is on the right side of a: 
. ab aa’ 
p =ax =a- = Pb when P isthe matrix ——. 
a'a a'a 


P is a column times a row! The column is a, the row is a1. Then divide by the number a'a. 


The matrix P is m by m, but its rank is one. We are projecting onto a one-dimensional 
subspace, the line through a. 


T 


, aa l l 
Example 4.7 Find the projection matrix P = —— onto the line through a = p | 
a‘a 


Solution Multiply column times row and divide by aTa = 9: 


1 22 

dá! 1 9 9 9 
= _ 1 = 2 4 4 
P=- =5/2|[1 2 2]=|/5 5 5 
2 2 4 4 

9 9 9 
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This matrix projects any vector b onto a. Check p = Pb for the particular b = (1, 1, 1) in 
Example 4.6: 


b 52. 2 5 
9 9 9 l 9 
p=Pb= é 4 í i= 1 which is correct. 
2 4 alļ|ı 10 
9 9 9 9 


Here are various remarks about P. If the vector a is doubled, the matrix P stays the 
same (it still projects onto the same line). If the matrix is squared, P? equals P (because 
projecting a second time doesn’t change anything). The diagonal entries of P add up to 
p+§+$=r 

The matrix J — P should be a projection too. It produces the other side e of the 
triangle—the perpendicular part of b. Note that (J — P)b equals b — p which is e. When 
P projects onto one subspace, I — P projects onto the perpendicular subspace. Here J — P 
projects onto the plane perpendicular to a. 

Now we move from a line in RÌ to an n-dimensional subspace of R”. Projecting onto 
a subspace takes more effort. The crucial formulas are collected in equations (4.8)—(4.10). 


Projection Onto a Subspace 


Start with n vectors a1, ..., an. Assume they are linearly independent. Find the combina- 
tion Xjay +++++Xnap that is closest to a given vector b. This is our problem: To project 
every vector b onto the n-dimensional subspace spanned by the a’s. 


With n = 1 (only one vector) this is projection onto a line. The line is the column space of 
a matrix A, which has just one column. In general the matrix A has n columns a1, ..., an. 
Their combinations in R” are the vectors Ax in the column space. We are looking for 
the particular combination p = AX (the projection) that is closest to b. The hat over x 
indicates the best choice, to give the closest vector in the column space. That choice is 
ê =a'b/a'a whenn = 1. 

We solve this problem for an n-dimensional subspace in three steps: Find the vec- 
tor Ñ, find the projection p = AX, find the matrix P. 

The key is in the geometry. The dotted line in Figure 4.5 goes from b to the nearest 
point Ax in the subspace. This error vector b — Ax is perpendicular to the subspace. The 
error b— AX makes a right angle with all the vectors a1, ..., an. That gives the n equations 
we need to find x: 


aT(b — Aŝ) =0 — aj — 
: or : b—Ax|=1|0]. (4.7) 
al(b — Aŝ) =0 —a, — 
The matrix in those equations is AT. The n equations are exactly Al(b — Ax) = 0. 


Rewrite A'(b — AX) = 0 in its famous form A'AX = ATb. This is the equation 
for x, and the coefficient matrix is ATA. Now we can find x and p and P: 
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‘ 
column a, Eem 0 


| ale =0 
ATe = AT(b-A£ )=0 
p = AẸ = Pb 


column a > 


Figure 4.5 The projection p is the nearest point to b in the column space of A. The 

error e is in the nullspace of AT. 

4F The combination 1a; +---+X,a, = Ax that is closest to b comes from 
Al(b—Ax)=0 or ATAS = A'b. (4.8) 


The matrix ATA is n by n and invertible. The solution is ê = (A! A)~!A'D. The projec- 
tion of b onto the subspace is 


p = A = A(AT A)! ATD. (4.9) 
This formula shows the projection matrix that produces p = Pb: 


P=A(A'A)!A!?. (4.10) 


Compare with projection onto a line, when A has only one column a: 


The formulas are identical! The number aTa becomes the matrix A! A. When it is a num- 
ber, we divide by it. When it is a matrix, we invert it. The new formulas contain (A! A)! 
instead of 1/ a'a. The linear independence of the columns aj, ..., a, will guarantee that 
this inverse matrix exists. 

The key step was the equation A'(b — Ax) = 0. We used geometry (perpendicular 
vectors). Linear algebra gives this “normal equation” too, in a very quick way: 


1 Our subspace is the column space of A. 
2 The error vector b — AX is perpendicular to that subspace. 


3 Therefore b — AX is in the left nullspace. This means AT (b — AX) = 0. 
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The left nullspace is important in projections. This nullspace of A! contains the error 
vector e = b— Ax. The vector b is being split into the projection p and the error e = b— p. 
Figure 4.5 shows these two parts in the two subspaces. 


Example 4.8 If A = |; 1| and b =|6] find ê and p and P. 


Solution Compute the square matrix ATA and also the vector ATb: 


i 1 j a of J pe f l 1 i "H 
0 1 2 E 3 5 0 1 2 0 0 
Now solve the normal equation A! AX = ATb to find £: 


3 3// x, |6 ; ER Ki} =| 5 
e JEJL] e BLI e» 


The combination p = A£ is the projection of b onto the column space of A: 


1 0 5 l 
p=5|1|—3|1]|=]ļ| 2]. Theerroris b- p= |-2 |. (4.12) 
1 2 —1 1 


That solves the problem for one particular b. To solve it for every b, compute the 
matrix P = A(A'A)~!A!. Find the determinant of ATA (which is 15 — 9 = 6) and find 
(A! A)~!. Then multiply A times (ATA)! times A? to reach P: 


PEPE" T 2S 
(ATA)! =} and P=}| 2 2 2|. (4.13) 
T. =f 2 5 


Two checks on the calculation. First, the error e = (1, —2, 1) is perpendicular to 
both columns (1, 1, 1) and (0, 1, 2). Second, the final P times b = (6, 0, 0) correctly gives 
p = (5,2, —1). We must also have P? = P, because a second projection doesn’t change 
the first projection. 


Warning The matrix P = A(A!A)~!A! is deceptive. You might try to split (ATA)~! 
into AT! times (A!)~!. If you make that mistake, and substitute it into P, you will find 
P = AA~!(A!)—!A!. Apparently everything cancels. This looks like P = 7, the identity 
matrix. We want to say why this is wrong. 

The matrix A is rectangular. It has no inverse matrix. We cannot replace AAT! 
by I, because there is no AT! in the first place. 

In our experience, a problem that involves a rectangular matrix almost always leads 
to ATA. We cannot split its inverse into A~! and (A')~!, which don’t exist. What does 
exist is the inverse of the square matrix A! A. This fact is so crucial that we state it clearly 
and give a proof. 
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4G ATA is invertible if and only if the columns of A are linearly independent. 


Proof A! A is a square matrix (n by n). For every matrix A, we will show that ATA has 
the same nullspace as A. When the columns of A are linearly independent, this nullspace 
contains only the zero vector. Then A! A, which has this same nullspace, is invertible. 

Let A be any matrix. If x is in its nullspace, then Ax = 0. Multiplying by A! gives 
AT Ax = 0. So x is also in the nullspace of A! A. 

Now start with the nullspace of ATA. From A! Ax = 0 we must prove that Ax = 0. 
We can’t multiply by (A')~!, which generally doesn’t exist. A better way is to multiply 
by xT: 

(xT)ATAx = (x0 or (Ax)'(Ax)=0 or |/Ax|i* =0. 


The vector Ax has length zero. Therefore Ax = 0. Every vector x in one nullspace is 
in the other nullspace. If A has dependent columns, so does ATA. If A has independent 
columns, so does A! A. This is the good case: 


When A has independent columns, ATA is square, symmetric, and invertible. 


To repeat for emphasis: A! A is (n by m) times (m by n). It is always square (n by n). 
It is always symmetric, because its transpose is (A! A)! = AT (AT)! which equals A! A. 
We also proved that ATA is invertible—provided A has independent columns. Watch the 
difference between dependent and independent columns: 


A ATA A ATA 
WETE he eee) ae 
2 2 0 0 0 4 8 2 a | 01 4 9 

dependent singular indep. invertible 


Very brief summary To find the projection p = ja, +- - + %,@n, solve A'AX = ATb. 
The projection is A£ and the error is e = b — p = b — Ax. The projection matrix 
P = A(A'A)~!A! gives p = Pb. 

This matrix satisfies P* = P. The distance from b to the subspace is |lell. 


Problem Set 4.2 


Questions 1—9 ask for projections onto lines. Also errors e = b — p and matrices P. 


1 Project the vector b onto the line through a. Check that e is perpendicular to a: 


1 1 1 —] 
(a) b=} 2 and a=} 1 (b) b=|3 and a= | -3 
2 1 ] -1 
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Draw the projection of b onto a and also compute it from p = xa: 


cos 8 1 1 1 
(a) pef] and «=| (b) b=] and ae 


In Problem 1, find the projection matrices P} = aa'/a'a and similarly P2 onto the 
lines through a in (a) and (b). Verify that P? = Pı. Multiply Pb in each case to 
compute the projection p. 


Construct the projection matrices Pı and Pz onto the lines through the a’s in Prob- 
lem 2. Verify that P = P. Explain why Ps should equal P2. 


Compute the projection matrices aa T/aTa onto the lines through a; = (—1, 2, 2) and 
az = (2,2, —1). Multiply those projection matrices and explain why their product 
Pı Pa is what it is. 


Project b = (1, 0,0) onto the lines through a; and a2 in Problem 5 and also onto 
a3 = (2, —1, 2). Add up the three projections pı + p2 + p3. 


Continuing Problems 5—6, find the projection matrix P3 onto a3 = (2, —1, 2). Verify 
that Pi + Po + P3 =. 


1 
a= [o] 
Pi Poa, 0 


Questions 5-6-7 Questions 8-9-10 


Project the vector b = (1, 1) onto the lines through a; = (1,0) and a2 = (1, 2). 
Draw the projections pı and pz onto the second figure and add pı + p2. The pro- 
jections do not add to b because the a’s are not orthogonal. 


Project ay = (1,0) onto az = (1,2). Then project the result back onto aj. Draw 
these projections on a copy of the second figure. Multiply the projection matrices 
Pı P2: Is this a projection? 


The projection of b onto the plane of a, and a2 will equal b. The projection matrix 
is P = . Check P = A(A’A)~'A? for A = [a; a2] =[43]. 
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Questions 11-20 ask for projections, and projection matrices, onto subspaces. 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


Project b onto the column space of A by solving A'AX = ATb and p = Aŝ: 


1 1 2 1 4 
(a) A=|]0O 1] and b=]|3 (bt) A=]1 1] and b=] 4 
0 0 4 0 6 


Find e = b — p. It should be perpendicular to the columns of A. 


Compute the projection matrices Pı and P2 onto the column spaces in Problem 11. 
Verify that Pb gives the first projection pı. Also verify P? = e 


Suppose A is the 4 by 4 identity matrix with its last column removed. A is 4 by 3. 
Project b = (1, 2, 3, 4) onto the column space of A. What shape is the projection 
matrix P and what is P? 


Suppose b equals 2 times the first column of A. What is the projection of b onto the 
column space of A? Is P = I in this case? Compute p and P when b = (0, 2, 4) 
and the columns of A are (0, 1, 2) and (1, 2, 0). 


If A is doubled, then P = 2A(4ATA)-12AT. This is the same as A(A!A)7!A?. 
The column space of 2A is the same as _ . Is ¥ the same for A and 2A? 


What linear combination of (1, 2, —1) and (1, 0, 1) is closest to b = (2, 1, 1)? 


(important) If P? = P show that (J — P)* = I — P. When P projects onto the 
column space of A, I — P projects onto the 


(a) If P is the 2 by 2 projection matrix onto the line through (1, 1), then 7 — P is 
the projection matrix onto __—_— 


(b) If P is the 3 by 3 projection matrix onto the line through (1, 1, 1), then J — P 
is the projection matrix onto 


To find the projection matrix onto the plane x — y — 2z = 0, choose two vectors in 
that plane and make them the columns of A. The plane should be the column space. 
Then compute P = A(A'A)7!A!?. 


To find the projection matrix P onto the same plane x — y — 2z = 0, write down a 
vector e that is perpendicular to that plane. Compute the projection Q = ee'/ele 
and then P = I — Q. 


Questions 21-26 show that projection matrices satisfy P? = P and P! = P. 


21 


Multiply the matrix P = A(A'A)~!A? by itself. Cancel to prove that P? = P. 
Explain why P (Pb) always equals Pb: The vector Pb is in the column space so its 
projection is __— 


174 4 Orthogonality 


22 Prove that P = A(A'A)~!A! is symmetric by computing PT. Remember that the 
inverse of a symmetric matrix is symmetric. 


23 IfA is square and invertible, the warning against splitting (A! A)~! does not apply: 
P = AA7!(A!)~!A! = I. When A is invertible, why is P = I? What is the 
error e? 


24 The nullspace of A! is to the column space R(A). So if A'b = 0, the 
projection of b onto R(A) should be p = . Check that P = A(ATA)7!A!? 
gives this answer. 


25 Explain why the projection matrix P onto an n-dimensional subspace has rank r = n. 
What is the column space of P? 


26 If anm bym matrix has A? = A and its rank is m, prove that A = 1. 


27 The important fact in Theorem 4G is this: If A'Ax = 0 then Ax = 0. New proof: 
The vector Ax is in the nullspace of . Ax is always in the column space of 
. To be in both perpendicular spaces, Ax must be zero. 


28 The first four wavelets are in the columns of this wavelet matrix W: 


What is special about the columns of W? Find the inverse wavelet transform wl, 
What is the relation of W~! to W? 
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It often happens that Ax = b has no solution. The usual reason is: too many equations. The 
matrix has more rows than columns. There are more equations than unknowns (m is greater 
than n). The n columns span a small part of m-dimensional space. Unless all measurements 
are perfect, b is outside that column space. Elimination reaches an impossible equation and 
stops. But these are real problems and they need an answer. 

To repeat: We cannot always get the error e = b — Ax down to zero. When e is 
zero, X is an exact solution to Ax = b. When the length of e is as small as possible, x is 
a least-squares solution. Our goal in this section is to compute ¥ and use it. 


The previous section emphasized p (the projection). This section emphasizes x (the least- 
squares solution). They are connected by p = Ax. The fundamental equation is still 
ATA = A'b. When the original Ax = b has no solution, multiply by AT and solve 
ATAX = Alb. 
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Example 4.9 Find the closest straight line to three points (0, 6), (1, 0), and (2, 0). 


No straight line goes through those points. We are asking for two numbers C and D that 
satisfy three equations. The line is b = C + Dt. Here are the equations at t = 0,1, 2 to 
match the given values b = 6, 0, 0: 


The first point is on the line if C+D-0=6 
The second point is on the lineif C+D-1=0 
The third point is on the line if C+D-2=0. 


This 3 by 2 system has no solution. The right side b is not a combination of the columns 
of A: 


1 0 C 6 
A=|{1 1 E A b=1|0 Ax = bis unsolvable. 
2 0 


The same numbers were in Example 4.8 in the last section. In practical problems, the data 
points are closer to a line than these points. But they don’t match any C + Dt, and there 
easily could be m = 100 points instead of m = 3. The numbers 6, 0, 0 exaggerate the error 
so you can see it clearly. 


Minimizing the Error 


How do we make e = b — Ax as small as possible? This is an important question with a 
beautiful answer. The best x (called x) can be found by geometry or algebra or calculus: 


By geometry All vectors Ax lie in the plane of the columns (1, 1, 1) and (0, 1, 2). In that 
plane, we are looking for the point closest to b. The nearest point is the projection p. 


The best choice for Ax is p. The smallest possible error is e = b — p. 


By algebra Every vector b has a part in the column space of A and a perpendicular part 
in the left nullspace. The column space part is p. The left nullspace part is e. There is an 
equation we cannot solve (Ax = b). There is an equation we do solve (by removing e): 


Ax =b=p+e_ is impossible; Ax = p is least squares. (4.14) 
The solution to Ax = p makes the error as small as possible, because for any x: 
| Ax — bl? = ||Ax — pl’ + llell’. (4.15) 


This is the law c? = a? + b? for a right triangle. The vector Ax — p in the column space is 
perpendicular to e in the left nullspace. We reduce Ax — p to zero by choosing x to be x. 
That leaves the smallest possible error (namely e). 
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Notice what “smallest” means. The squared length of Ax — b is being minimized: 


The least-squares approximation makes E = || Ax — b||? as small as possible. 


By calculus Most functions are minimized by calculus! The derivatives are zero at the 
minimum. The graph bottoms out and the slope in every direction is zero. Here the error 
function to be minimized is a sum of squares: 


E = ||Ax — bl? =(C+D-0—6) +(C+D-1)*+(C4+D-2)’. (4.16) 


The unknowns are C and D. Those are the components of x, and they determine the line. 
With two unknowns there are two derivatives—both zero at the minimum. They are called 
“partial derivatives” because the C derivative treats D as constant and the D derivative 
treats C as constant: 


2E =2(C+D-0-6) 4+2(C+D-1) +4+2(C+D-2) =0 
3E = AC + D-0 —6)(0) + AC + D- 1)(1) +2(C + D- 2)(2) =. 


0E/dD contains the extra factors 0, 1,2. Those are the numbers in E that multiply D. 

They appear because of the chain rule. (The derivative of (4 + 5x)? is 2 times 4 + 5x 

times an extra 5.) In the C derivative the corresponding factors are (1)(1)(1), because C is 

always multiplied by 1. It is no accident that 1, 1, 1 and 0, 1, 2 are the columns of A. 
Now cancel 2 from every term and collect all C’s and all D’s: 


The C derivative gk is zero: 3C+3D=6 
The D derivative $3 is zero: 3C+5D=0. 


(4.17) 
D 


Figure 4.6 The closest line has heights p = (5, 2, —1) with errors e = (1, —2, 1). The 
equations ATA = ATb give ê = (5, —3). The line is b = 5 — 3t and the projection is 
Sa, — 3a. 
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row space column space 


Ax=b eee 
aes Ro ee not possible - - - - »-€ b=p+e 
b not in column space ‘ 


0 


Independent columns 
No nullspace 


Figure 4.7 The projection p is closest to b, so ||b — p|? = ||b — Ax ||" gives minimum 
error. 


These equations are identical with A’ AX = A"b. The best C and D are the com- 
ponents of x. The equations (4.17) from calculus are the same as the “normal equations” 
from linear algebra: 


The partial derivatives of || Ax — b||? are zero when AT Ax = A'b. 


The solution is C = 5 and D = —3. Therefore b = 5 — 3t is the best line—it comes 
closest to the three points. At t = 0, 1, 2 this line goes through 5, 2, —1. It could not go 
through 6, 0, 0. The errors are 1, —2, 1. This is the vector e! 

Figure 4.6a shows the closest line to the three points. It misses by distances e1, e2, €3. 
Those are vertical distances. The least-squares line is chosen to minimize E = etter tes. 

Figure 4.6b shows the same problem in another way (in 3-dimensional space). The 
vector b is not in the column space of A. That is why we could not put a line through the 
three points. The smallest possible error is the perpendicular vector e to the plane. This is 
e = b — Ax, the vector of errors (1, —2, 1) in the three equations—and the distances from 
the best line. Behind both figures is the fundamental equation A' AX = ATb. 

Notice that the errors 1, —2, 1 add to zero. The error e = (ej, e2, e3) is perpendicular 
to the first column (1, 1, 1) in A. The dot product gives e; + e2 + e3 = 0. 


The Big Picture 


The key figure of this book shows the four subspaces and the true action of a matrix. 
The vector x on the left side of Figure 4.2 went to b = Ax on the right side. In that figure 
x was split into x, + Xp. 

In this section the situation is just the opposite. There are no solutions to Ax = b. 
Instead of splitting up x we are splitting up b. Figure 4.7 shows the big picture for least 
squares. Instead of Ax = b we solve AX = p. The error e = b — p is unavoidable. 
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Notice how the missing nullspace N(A) is very small—just one point. With indepen- 
dent columns, the only solution to Ax = 0is x = 0. Then ATA is invertible. The equation 
ATAS = A'b fully determines the best vector £. 

Section 7.2 will have the complete picture—all four subspaces included. Every x 
splits into x, + xn, and every b splits into p + e. The best solution is still x (or X,) in the 
row space. We can’t help e and we don’t want x,—this leaves AX = p. 


Fitting a Straight Line 


This is the clearest application of least squares. It starts with m points in a plane—hopefully 
near a straight line. At times t,, ..., tm those points are at heights b1, ..., bm. Figure 4.6a 
shows the best line b = C + Dt, which misses the points by distances e1,..., €m. Those 
are vertical distances. No line is perfect, and the least-squares line is chosen to minimize 
E=e}+---+e. 

The first example in this section had three points. Now we allow m points (m can be 
large). The algebra still leads to the same two equations A' AX = A'b. The components 
of £ are still C and D. 

Figure 4.6b shows the same problem in another way (in m-dimensional space). A 
line goes exactly through the m points when we exactly solve Ax = b. Generally we can’t 
do it. There are only two unknowns C and D because A has n = 2 columns: 


C+ Dti = bi f 
C+ Dh = b2 l fo 

Ax =b is l and AS a Wee (4.18) 
C Din bn l tm 


The column space is so thin that almost certainly b is outside of it. The components of e 
are the errors €1,..., €m. 

When b happens to lie in the column space, the points happen to lie on a line. In that 
case b = p. Then Ax = b is solvable and the errors are e = (0,..., 0). 


The closest line has heights pı, ..., Pm with errors €i, ..., €m. 
The equations AT AX = A"b give £ = (C, D). The errors are e; = bi — C — Dti. 


Fitting points by straight lines is so important that we give the equations once and 
for all. Remember that b = C + Dt exactly fits the data points if 


C + Dti = b 1 t bi 
or a. A a} 2. |, (4.19) 


This is our equation Ax = b. It is generally unsolvable, if A is rectangular. But there is 
one good feature. The columns of A are independent (unless all times t; are the same). So 
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we turn to least squares and solve A’ Ax = ATb. The “dot-product matrix” ATA is 2 by 2: 


ty 
eb EJER 
lm 


De Det. 
On the right side of the equation is the 2 by 1 vector ATb: 


bı 
bi 
a=], ' : |= 2 . (4.21) 
ty lm p X tibi 


In a specific problem, all these numbers are given. The m equations reduce to two equa- 
tions. A formula for C and D will be in equation (4.30), but solving the two equations 
directly is probably faster than substituting into the formula. 


l 


1 


4H The line C + Dt which minimizes e? +-+- + eĉ is determined by ATAX = ATb: 


m Sti C $ bi 
f a B E FA eg 


The vertical errors at the m points on the line are the components of e = b — p. 

As always, those equations come from geometry or linear algebra or calculus. The 
error vector b — AX is perpendicular to the columns of A (geometry). It is in the nullspace 
of A! (linear algebra). The best X = (C, D) minimizes the total error E, the sum of 
squares: 


E(x) = ||Ax — b||? = (C + Dti — b1)? +--+ (C + Dtm — bm). 


When calculus sets the derivatives dE /dC and dE/0D to zero, it produces the two equa- 
tions in (4.22). 

Other least-squares problems have more than two unknowns. Fitting a parabola has 
n = 3 coefficients C, D, E (see below). In general x stands for n unknowns x1, ..., Xp. 
The matrix A has n columns. The total error is a function E(x) = ||Ax—b |? of n variables. 
Its derivatives give the n equations A! At = ATb. The derivative of a square is linear—this 
is why the method of least squares is so popular. 


Fitting by a Parabola 


If we throw a ball, it would be crazy to fit the path by a straight line. A parabola b = 
C + Dt + Et? allows the ball to go up and come down again (b is the height at time t). 
The actual path is not a perfect parabola, but the whole theory of projectiles starts with that 
approximation. 

When Galileo dropped a stone from the Leaning Tower of Pisa, it accelerated. The 
distance contains a quadratic term 5 gt*. (Galileo’s point was that the stone’s mass is not 
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involved.) Without that term we could never send a satellite into the right orbit. But even 
with a nonlinear function like t7, the unknowns C, D, E appear linearly. Choosing the best 
parabola is still a problem in linear algebra. 


Problem Fit bı, ..., bm at times t1, ..., tm by a parabola b = C + Dt + Et?. 


With m > 3 points, the equations to fit a parabola exactly are generally unsolvable: 


C+ Dt, + Et? = by ln p 
has the m by 3 matrix A=f{[: : : |. (4.23) 

2 

C + Dtm + Et? = bm l fm tm 


Least squares The best parabola chooses £ = (C, D, E) to satisfy the three normal equa- 
tions ATA = Afb. 


May we ask you to convert this to a problem of projection? The column space of A 
has dimension . The projection of b is p = Ax, which combines the three columns 
using the coefficients , , ____.. The error at the first data point is e} = 
b,-C—Dt,-—E re The total squared error is E = e? + . If you prefer to minimize 
by calculus, take the partial derivatives of E withrespectto_ _____ , , . These 
three derivatives will be zero when £ = (C, D, E) solves the 3 by 3 system of equations 


Example 4.10 For a parabola b = C + Dt + Et? to go through b = 6,0,0 whent = 
0, 1, 2, the equations are 


C+D-0+E-0=6 
C+D-1+E-17=0 (4.24) 
C+D-2+E-2=0. 


This is Ax = b. We can solve it exactly. Three data points give three equations and a 
square matrix. The solution is x = (C, D, E) = (6, —9,3). The parabola through the 
three points is b = 6 — 9t + 3¢?. 

What does this mean for projection? The matrix has three columns, which span the 
whole space R°. The projection matrix is J. The projection of b is b. The error is zero. 
We didn’t need A' AX = ATb, because we solved Ax = b. Of course we could multiply 
by AT, but there is no reason to do it. 

Figure 4.8a shows the parabola through the three points. It also shows a fourth 
point b4 at time t4. If that falls on the parabola, the new Ax = b (four equations) is still 
solvable. When the fourth point is not on the parabola, we turn to A) AX = ATb. Will the 
least-squares parabola stay the same, with all the error at the fourth point? Not likely! 

An error vector (0, 0, 0, e4) would not be perpendicular to the column (1, 1, 1, 1). 
Least squares balances out the four errors, and they add to zero. 
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6 
0 | is in R4! 
0 
by 


b = 6—9t+312 


Figure 4.8 Exact fit of parabola through three points means p = b and e = 0. Fourth 
point requires least squares. 


Problem Set 4.3 


Problems 1—10 use four data points to bring out the key ideas. 


1 


(Straight line b = C + Dt through four points) With b = 0, 8, 8, 20 at times 
t = 0, 1,3,4, write down the four equations Ax = b (unsolvable). Change the 
measurements to p = 1, 5, 13, 17 and find an exact solution to AÑ = p. 


With b = 0, 8, 8, 20 att = 0, 1, 3, 4, set up and solve the normal equations ATAF = 
A'b. For the best straight line in Figure 4.9a, find its four heights and four errors. 
What is the minimum value of E = e? + es + es + ez? 


Compute p = AX for the same b and A using A! AX = ATb. Check that e = b — p 
is perpendicular to both columns of A. What is the shortest distance ||e|| from b to 
the column space? 


(Use calculus) Write down E = || Ax — b||? as a sum of four squares involving C 
and D. Find the derivative equations dE /dC = 0 and dE /dD = 0. Divide by 2 to 
obtain the normal equations A'Ax = A'D. 


Find the height C of the best horizontal line to fit b = (0, 8, 8, 20). An exact fit 
would solve the unsolvable equations C = 0,C = 8,C = 8,C = 20. Find the 
4 by 1 matrix A in these equations and solve A' Ax = ATb. Redraw Figure 4.9a to 
show the best height x = C and the four errors in e. 


Project b = (0, 8, 8, 20) onto the line through a = (1, 1, 1,1). Find x = a'b/a'a 
and the projection p = £a. Redraw Figure 4.9b and check that e = b — p is 
perpendicular to a. What is the shortest distance |le|| from b to the line in your 
figure? 
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b, = 20 
b= (0, 8, 8, 20) 
ex 
` p=Ca,+Da, 
b, = b,=8 
a> (0, l, CF 4) 
Pi a,=(1, 1,1,1) 
b, =0 
t=O t=1 t,=3 %4=4 


Figure 4.9 The closest line in 2 dimensions matches the projection in R4. 


7 Find the closest line b = Dt, through the origin, to the same four points. An exact 
fit would solve D -0 = 0, D -1 = 8, D-3 = 8, D- 4 = 20. Find the 4 by 1 matrix 
and solve A! At = A'b. Redraw Figure 4.9a showing the best slope £ = D and the 
four errors. 


8 Project b = (0, 8, 8, 20) onto the line through a = (0, 1, 3, 4). Find £ and p = xa. 
The best C in Problems 5-6 and the best D in Problems 7-8 do not agree with the 
best (C, D) in Problems 1—4. That is because (1, 1, 1, 1) and (0, 1,3,4) are _ 
perpendicular. 


9 For the closest parabola b = C + Dt + Et? to the same four points, write down 
the unsolvable equations Ax = b. Set up the three normal equations A) AX = 
ATb (solution not required). In Figure 4.9a you are now fitting a parabola—what is 
happening in Figure 4.9b? 


10 For the closest cubic b = C + Dt + Et? + Ft? to the same four points, write down 
the four equations Ax = b. Solve them by elimination. In Figure 4.9a this cubic now 
goes exactly through the points. Without computation write p and e in Figure 4.9b. 


11 The average of the four times is f = 40 +1+3+4) = 2. The average of the 
four b’s is b = 5(0+8+8 + 20) == 9. 


(a) Verify that the best line goes through the center point (f, b) = (2,9). 
(b) Explain why C + Dt = b comes from the first of the two equations in (4.33). 


Questions 12—16 introduce basic ideas of statistics—the foundation for least squares. 


12 (Recommended) This problem projects b = (bi, ..., bm) onto the line through a = 
(De aam.ghys 


(a) Solve a'ax = a'b to show that 2 is the mean of the b’s. 


13 


14 


15 


16 


4.3 Least-Squares Approximations 183 


(b) Find the error vector e and the variance |le||* and the standard deviation lell. 


(c) Draw a graph with b = (1, 2, 6) fitted by a horizontal line. What are p and e 
on the graph? Check that p is perpendicular to e and find the matrix P. 


First assumption behind least squares: Each measurement error has “expected value” 
zero. Multiply the eight error vectors b — Ax = (+1, +1, +1) by (ATA)—!AP to 
show that the eight vectors  — x also average to zero. The expected value of £ is 
the correct x. 


Second assumption behind least squares: The measurement errors are independent 
and have the same variance 0”. The average of (b — Ax)(b— Ax)! is oI. Multiply 
on the left by (A A)~! A! and multiply on the nght by A(A! A)! to show that the 
average of (ê — x)(¥ — x)! isa7(A!A)7!. This is the “covariance matrix” for the 
error in X. 


A doctor takes m readings b1, ..., bm of your pulse rate. The least-squares solution 
to the m equations x = by,x = bo,...,x = bm is the average £ = (bı +--+ + 
bm)/m. The matrix A is a column of 1’s. Problem 14 gives the expected error 
(¢ — x)? as o? (ATA)! = . By taking m measurements, the variance drops 
from o° to o?/m. 


If you know the average £99 of 99 numbers b1, . . . , b99, how can you quickly find the 
average £100 with one more number b100 ? The idea of recursive least squares is to 
avoid adding 100 numbers. What coefficient correctly gives ¥100 from b100 and £99? 


7092100 + 9 = z5 (bi +- + b100). 


Questions 17-25 give more practice with £ and p and e. 


17 


18 


19 


20 


21 


22 


Write down three equations for the line b = C + Dt to go through b = 7 att = —1, 
b = 7 att = 1, and b = 21 at t = 2. Find the least-squares solution x = (C, D) and 
draw the closest line. 


Find the projection p = A£ in Problem 17. This gives the three of the closest 
line. Show that the error vector is e = (2, —6, 4). 


Suppose the measurements at £ = —1, 1, 2 are the errors 2, —6, 4 in Problem 18. 
Compute £ and the closest line. Explain the answer: b = (2, —6, 4) is perpendicular 
to so the projection is p = 0. 


Suppose the measurements at t = —1, 1, 2 are b = (5, 13, 17). Compute x and the 
closest line and e. The error is e = 0 because this b is 


Which of the four subspaces contains the error vector e? Which contains p? Which 
contains x? What is the nullspace of A? 


Find the best line C + Dt to fit b = 4, 2, —1, 0, 0 at times t = —2, —1, 0, 1, 2. 
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23 (Distance between lines) The points P = (x, x, x) are on a line through (1, 1, 1) 
and Q = (y, 3y, —1) are on another line. Choose x and y to minimize the squared 
distance ||P — Q||?. 


24 Is the error vector e orthogonal to b or p or e or £? Show that lel? equals eTb which 
equals bTb — b! p. This is the smallest total error E. 


25 The derivatives of ||Ax||* with respect to the variables x1,...,X, fill the vector 
2ATAx. The derivatives of 2b’ Ax fill the vector 2A'b. So the derivatives of 
|| Ax — b||? are zero when 


4.4 Orthogonal Bases and Gram-Schmidt 


This section has two goals. The first is to see how orthogonal vectors make calculations 
simpler. Dot products are zero—this removes the work in A! A. The second goal is to 
construct orthogonal vectors. We will pick combinations of the original vectors to produce 
right angles. Those original vectors are the columns of A. The orthogonal vectors will be 
the columns of a new matrix Q. 

You know from Chapter 3 what a basis consists of—independent vectors that span 
the space. Numerically, we always compute with a basis. It gives a set of coordinate axes. 
The axes could meet at any angle (except 0° and 180°). But every time we visualize axes, 
they are perpendicular. In our imagination, the coordinate axes are practically always 
orthogonal. This simplifies the picture and it greatly simplifies the computations. 

The vectors q1, ..., qn are orthogonal when their dot products q; -q; are zero. More 
exactly q; qj = 0 whenever i Æ j. With one more step—just divide each vector by its 
length—the vectors become orthogonal unit vectors. Their lengths are all 1. Then the 
basis is called orthonormal. 


Definition The vectors q1,..., qn are orthonormal if 


ee 0 wheni Æj (orthogonal vectors) 
oy 1 wheni=j (unit vectors: |\q;|| = 1) 


A matrix with orthonormal columns is assigned the special letter Q. 


The matrix Q is easy to work with because QT Q = I. This says in matrix language 
that the columns are orthonormal. It is equation (4.25) below, and Q is not required to be 
square. When the matrix is square, QTQ = J means that QT = Q7!. 


41 A matrix Q with orthonormal columns satisfies QTQ = I: 


e 1 0 0 
= TE | 0 1 0 
T T 
Q Q=| —gq— q R2 G|= = I. (4.25) 
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When row i of QT multiplies column j of Q, the dot product is q} qj. Off the diagonal 
(i # j) that is zero by orthogonality. On the diagonal (i = j) the unit vectors give 
q) 9: = lqil? = 1. 

If the columns are only orthogonal (not unit vectors), then QTQ is a diagonal matrix 
(not the identity matrix). We wouldn’t use the letter Q. But this matrix is almost as good. 
The important thing is orthogonality—then it is easy to produce unit vectors. 

To repeat: Q'Q = I even when Q is rectangular. In that case QT is only an inverse 
from the left. For square matrices we also have Q QT = I, so Q! is the two-sided inverse 
of Q. The rows of a square Q are orthonormal like the columns. To invert the matrix we 
just transpose it. In this square case we call Q an orthogonal matrix. 

Here are three examples of orthogonal matrices—rotation and permutation and re- 
flection. 


Example 4.11 (Rotation) Q rotates every vector in the plane through the angle 0: 


On| ee a md gv = ot =| cos 0 med 


sin 6 cos 8 —sin@ cos@ 


The columns of Q are orthogonal (take their dot product). They are unit vectors because 
sin? 0 + cos? = 1. Those columns give an orthonormal basis for the plane R2. The 
standard basis vectors i and j are rotated through 0 (see Figure 4.10a). 

Q~! rotates vectors back through —@. It agrees with QT, because the cosine of —0 
is the cosine of 0, and sin(—@) = —sin@. We have Q'Q = I and QQ! = 1. 


Example 4.12 (Permutation) These matrices change the order of the components: 


O 1 O{]] x y 
00 1ļllyļl=lz]| and p RL 
1 0 oj|z x f 
All columns of these Q’s are unit vectors (their lengths are obviously 1). They are also 


orthogonal (the 1’s appear in different places). The inverse of a permutation matrix is its 
transpose. The inverse puts the components back into their original order: 


0 0 1 y x 
O lily x 

Inverse = transpose: 1 0 0 z|/=I|y and ee 
0 1 ojx z 2 


The 2 by 2 permutation is a reflection across the 45° line in Figure 4.10b. 


Example 4.13 (Reflection) If u is any unit vector, set Q = I — 2uu!. Notice that uu! is 
a matrix while u! wu is the number |u|? = 1. Then QT and Q7! both equal Q: 


QT = Į —2uu' = Q and o'a = Į] —4uu' + 4uu'uu! = 1. (4.26) 


t“Orthonormal matrix” would have been a better name for Q, but it’s not used. Any matrix with orthonormal 
columns can be Q, but we only call it an orthogonal matrix when it is square. 
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Figure 4.10 Rotation by Q = [|$ 75 ] and reflection by Q = [9 } |. 


S C 


Reflection matrices 7 — 2uu are symmetric and also orthogonal. If you square them, you 
get the identity matrix. Reflecting twice through a mirror brings back the original, and 
Q? = Q'Q = I. Notice u'u = 1 near the end of equation (4.26). 

As examples we choose the unit vectors u; = (1, 0) and then uz = (1/ /2,=1/ /2). 
Compute 2uu (column times row) and subtract from Z: 


aera} -fi 9] me mre e i] 


Qı reflects wu; = (1, 0) across the y axis to (—1, 0). Every vector (x, y) goes into its mirror 
image (—x, y) across the y axis: 


Reflection from Q1: E i H = a 


Q> is reflection across the 45° line. Every (x, y) goes to (y, x)—this was the permutation 
in Example 4.12. A vector like (3, 3) doesn’t move when you exchange 3 and 3—it is on 
the mirror line. Figure 4.10b shows the mirror. 


Rotations preserve the length of a vector. So do reflections. So do permutations. 
So does multiplication by any orthogonal matrix—lengths don’t change. This is a key 


property of Q: 
4j If Q has orthonormal columns (QT Q = I), it leaves lengths unchanged: 


| Qx || = ||x|| for every vector x. (4.27) 
Q also preserves dot products and angles: (Ox)'(Qy) =x' Q'Oy=x'y. 


Proof || Qx||? is the same as ||x ||? because (Qx)T (Qx) = x' Q'Ox = xTIx. Orthogonal 
matrices are excellent for computations—numbers can never grow too large when lengths 
are fixed. Good computer codes use Q’s as much as possible, to be numerically stable. 


Projections Using Orthogonal Bases: Q Replaces A 
This chapter is about projections onto subspaces. We developed the equations for ¥ and p 


and P. When the columns of A were a basis for the subspace, all formulas involved A! A. 
(You have not forgotten A' AX = ATb.) The entries of ATA are the dot products a; + aj. 


4.4 Orthogonal Bases and Gram-Schmidt 187 


Suppose the basis vectors are not only independent but orthonormal. The a’s be- 
come q’s. Their dot products are zero (except q; -q; = 1). The new matrix Q has 
orthonormal columns: AT A simplifies to O'Q = I. Look at the improvements in x and 
p and P. Instead of QT Q we print a blank for the identity matrix: 


x= Qb and p=QxX and P=Q QT. (4.28) 


The least-squares solution of Qx = b is X = Q"b. The projection matrix simplifies to 


P = QQ". 


There are no matrices to invert, and Gaussian elimination is not needed. This is the point 
of an orthonormal basis. The components of £ = QTb are just dot products of b with 
the rows of QT, which are the q’s. We have n separate 1-dimensional projections. The 
“coupling matrix” or “correlation matrix” ATA is now QTQ = J. There is no coupling. 
Here is a display of ê = QTb and p = OR: 


—qi— qib 
—q' — q; b 
T 
| |7| 4b 
p=|q qn > | =qi(qib) +---+qn(q,b). 
| |d Lane 


When Q is a square matrix, the subspace is the whole space. Then QT is the same as Q7!, 
and x = QTb is the same as x = Q~'b. The solution is exact! The projection of b onto 
the whole space is b itself. In this case p = b and P = QQ! = J. 

You may think that projection onto the whole space is not worth mentioning. But 
when p = b, our formula assembles b out of its 1-dimensional projections. If g1,..., qn 
is an orthonormal basis for the whole space, every b is the sum of its components along 
the q’s: 


b = qi(qib) + q2(q5.b) +--+ + qn (4 fb). (4.29) 


That is QQ! = J. It is the foundation of Fourier series and all the great “transforms” of 
applied mathematics. They break vectors or functions into perpendicular pieces: Then by 
adding the pieces, the inverse transform puts the function back together. 


Example 4.14 The columns of this matrix Q are orthonormal vectors q1, q2, q3: 


| 
— 
N 
| 


Q=3| 2 -1 2] hasfirstcolumn qı = 


WIN WIN Wl 


The separate projections of b = (0, 0, 1) onto qı and q2 and q3 are 


qi(qib) = %qi and qo(q3b) = $q2 and q3(q3b) = —4q3. 
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The sum of the first two is the projection of b onto the plane of qı and q2. The sum of all 
three is the projection of b onto the whole space—which is b itself: 


=e ae) 0 
4+4+41 1 


Example 4.15 Fitting a straight line leads to Ax = b and least squares. The columns of A 
are orthogonal in one special case—when the measurement times add to zero. Suppose 
b = 1,2, 4 at times t = —2, 0, 2. Those times add to zero. The dot product with 1, 1, 1 is 
zero and A has orthogonal columns: 


C+ D(—2)=1 1 —2 C 1 
C+ D(0)=2 or Ax=/]1 0 ae 2 
C+ D2)=4 |. 2 4 


The measurements 1, 2, 4 are not on a line. There is no exact C and D and x. Look at the 
matrix A! A in the least-squares equation for £: 


Tas T 3 o|fC]_ [7 
ATA =A'b iis a al=l6|- 


Main point: ATA is diagonal. We can solve separately for C= 7 and D = £, The zeros 
in ATA are dot products of perpendicular columns in A. The denominators 3 and 8 are 
not l and 1, because the columns are not unit vectors. But a diagonal matrix is virtually as 
good as the identity matrix. 


Orthogonal columns are so helpful that it is worth moving the time origin to produce 
them. To do that, subtract away the average time f = (t1 +--+ + tm)/m. Then the shifted 
measurement times T; = t; — f add to zero. With the columns now orthogonal, ATA is 
diagonal. The best C and D (like i and 3 above) have direct formulas without inverse 
matrices: 


a bretb a bih ase by 
6 ENET O d Da I eea C 


(4.30) 
The best line is C + DT or C + Dit — À. The time shift is an example of the Gram- 
Schmidt process, which orthogonalizes the columns in advance. We now see how that 
process works for other matrices A. It changes A to Q. 


The Gram-Schmidt Process 


The point of this section is that “orthogonal is good.” Projections and least-squares solu- 
tions normally need the inverse of ATA. When this matrix becomes QTQ = 1, the inverse 
is no problem. If the vectors are orthonormal, the one-dimensional projections are uncou- 
pled and p is their sum. The best x is QTb (n separate dot products with b). For this to 
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~A 
onto AB 


plane A 
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Figure 4.11 Subtract projections from b and c to find B and C. Divide by || A||, || Bll, 
and |IC ||. 


be true, we had to say “Jf the vectors are orthonormal.” Now we find a way to make them 
orthonormal. 

Start with three independent vectors a, b, c. We intend to construct three orthogonal 
vectors A, B,C. Then (at the end is easiest) we divide A, B, C by their lengths. That 
produces three orthonormal vectors gj = A/||A|l, q2 = B/|\|Bll, q3 = C/|IC]]. 


Gram-Schmidt Begin by choosing A = a. This gives the first direction. The next 
direction B must be perpendicular to A. Start with b and subtract its projection along A. 
This leaves the perpendicular part, which is the orthogonal vector B: 


B = b — ——A. (4.31) 


A and B are orthogonal in Figure 4.11. Take the dot product with A to verify that ATB = 
A'b — A'b = 0. This vector B is what we have called the error vector e, perpendicular 
to A. Notice that B in equation (4.31) is not zero (otherwise a and b would be dependent). 
The directions A and B are now set. 

The third direction starts with c. This is not a combination of A and B (because c 
is not a combination of a and b). But most likely c is not perpendicular to A and B. So 
subtract off its components in those two directions to get C: 


B. (4.32) 


This is the one and only idea of the Gram-Schmidt process. Subtract from every new 
vector its projections in the directions already set. That idea is repeated at every step.' 
If we also had a fourth vector d, we would subtract its projections onto A, B, and C. 
That gives D. At the end, divide the orthogonal vectors A, B, C, D by their lengths. The 
resulting vectors q1, 92, q3, q4 are orthonormal. 


tWe think Gram had the idea. We don’t know where Schmidt came in. 
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Example 4.16 Suppose the independent vectors a, b, c are 


1 2 3 
a= |—l1 and b= 0 and c= |—3 
—2 3 


Then A = a has A! A = 2. Subtract from b its projection along A: 


1 
A'b 3 


Check: ATB = 0 as required. Now subtract two projections from c: 


1 
Ale BT 
A-——B=c—$A+SB=|1 


C2g= 
“~ ATA” BTB 


Check: C is perpendicular to A and B. Finally convert A, B, C to unit vectors (length 1, 
orthonormal). Divide by the lengths /2 and \/6 and /3. The orthonormal basis is 


l 
=F k and =R l| and B= 3 


Usually A, B, C contain fractions. Almost always q1, q2, q3 contain square roots. These 
vectors had m = 3 components, but the steps of Gram-Schmidt are always the same. 
Subtract projections and then change to unit vectors. 


The Factorization A = QR 


We started with a matrix A, whose columns were a, b,c. We ended with a matrix Q, 
whose columns are q1, q2, q3. How are those matrices related? Since the vectors a, b, c 
are combinations of the qg’s (and vice versa), there must be a third matrix connecting A 
to Q. Call it R. 

The first step was q1 = a/|la|| (other vectors not involved). The second equation 
was (4.31), where b is a combination of A and B. At that stage C and q3 were not involved. 
This non-involvement of later vectors is the key point of the process: 


- The vectors a and A and q; are along a single line. 
- The vectors a, b and A, B and q1, q2 are in a single plane. 


= The vectors a, b, c and A, B, C and q1, q2, q3 are in a subspace (dimension 3). 
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At every step, a1, ..., ag are combinations of qi,..., qx. Later qg’s are not involved. The 
connecting matrix R is a triangular matrix: 


qia qib ge 
abcj=|% QB qub qic or A=QR. (4.33) 
q3¢ 


A = QR is Gram-Schmidt in a nutshell. To understand those dot products in R, premulti- 
ply A = QR by QT. Since Q'Q = I this leaves QTA = R. The entries of QTA are rows 
times columns. They are dot products of q1, q2, q3 with a, b, c. That is what you see in R. 

Look in particular at the second column of A = QR. The matrix multiplication is 
saying that b = qi(q} b) + qq? b). This is exactly like equation (4.29), except that the 
third term q3 (qi b) is not present. At the second stage of Gram-Schmidt, q3 is not involved! 
The dot product qi b is zero in column 2 of R. 


4K (Gram-Schmidt and A = QR) The Gram-Schmidt process starts with independent 
vectors 41, ..., 4an. It constructs orthonormal vectors q1, ..., qn. The matrices with these 
columns satisfy A = OR. R is upper triangular because later qg’s are orthogonal to earlier 
a’sand Ri; = gq} a; = 0. 


Here are the a’s and g’s from the example. The i, j entry of R = QTA is the dot 
product of q; with aj: 


1 2 3 UJI 1/46 UBITI V2 18 
A=|-1 0 -3}=|-1/¥2 1/¥6 1/V¥3|} 0 V6 —V6|=OR. 
0 -2 3 0 -2/f6 1/¥3}/L0 0 V3 


The lengths of A, B, C are the numbers J/2, V6, V3 on the diagonal of R. Because of the 
square roots, QR looks less beautiful than LU. Both factorizations are absolutely central 
to calculations in linear algebra. 

Any m by n matrix A with independent columns can be factored into QR. The m 
by n matrix Q has orthonormal columns, and the square matrix R is upper triangular with 
positive diagonal. We must not forget why this is useful for least squares: ATA equals 
R' Q' OR = RTR. The least-squares equation A! AX = ATb simplifies to 


R'RE=R'OQ'D or RKE=OQ'D. (4.34) 


Instead of solving Ax = b, which is impossible, we solve Rx = O'b by back substitu- 
tion—which is very fast. The real cost is the mn? multiplications in the Gram-Schmidt 
process, which are needed to find Q and R. 


Here is an informal code. It executes equations (4.35) and (4.36), fork = 1 then k = 2 
and eventually k = n. Equation (4.35) divides orthogonal vectors by their lengths rz, to 
get unit vectors. The orthogonal vectors are called a;, a2, a3,... like the original vectors, 
but equation (4.36) has put A, B, C,... in their place. At the start, k = 1 anda; = A has 
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components aj: 


m 1/2 . 
Fkk = (Zh) and dik = a la lhes: (4.35) 
i=l lkk 
From each later aj (j = k + 1,...,n), subtract its projection on this new qg. The dot 


product qx + aj is rz; in the triangular matrix R: 
m 
Fkj = Ý ` qikaij and Aij = dij — ikřkj i = Eses M: (4.36) 
i=l 


Increase k to k + 1 and return to (4.35). Starting from a, b, c = a1, a2, a3 this code will 
1 Construct q1 = &1/ lla ||. 

2 Construct B = a — (qiay)qi and C*= a3 — (qi a3)qi. 

3 Construct q2 = B/|| Bl. 

4 Construct C = C* — (qi C*)qo. 

5 Construct q3 = C/||C|l. 


Equation (4.36) subtracts off projections as soon as the new vector gx, is found. This 
change to “one projection at a time” is called modified Gram-Schmidt. It is numerically 
more stable than equation (4.32) which subtracts all projections at once. 


Problem Set 4.4 


Problems 1-12 are about orthogonal vectors and orthogonal matrices. 


1 Are these pairs of vectors orthonormal or only orthogonal or only independent? 


l —1 6 4 cos 6 — sind 
(a) l] ane l 4 to) | oe E ©) pi ai l be : 
Change the second vector when necessary to produce orthonormal vectors. 


2 The vectors (2, 2, —1) and (—1, 2, 2) are orthogonal. Divide them by their lengths 
to find orthonormal vectors qı and q2. Put those into the columns of Q and multiply 


oro and Q QT. 


3 (a) If A has three orthogonal columns each of length 4, what is ATA? 
(b) If A has three orthogonal columns of lengths 1, 2, 3, what is A! A? 


4 Give an example of each of the following: 


(a) A matrix Q that has orthonormal columns but Q QT ÆI. 
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(b) Two orthogonal vectors that are not linearly independent. 


(c) An orthonormal basis for R4, where every component is 5 or —5. 


5 Find two orthogonal vectors that lie in the plane x + y + 2z = 0. Make them 
orthonormal. 


6 If Qı and Q2 are orthogonal matrices, show that their product Q1 Q2 is also an 
orthogonal matrix. (Use QTQ = 1.) 


7 If the columns of Q are orthonormal, what is the least-squares solution £ to Qx = b? 
Give an example with b Æ 0 but x = 0. 


8 (a) Compute P = OQ! when q1 = (.8, .6, 0) and q2 = (—.6, .8, 0). Verify that 
p? = P. 
(b) Prove that always (0Q")(QQ") = QQ! by using QTQ = I. P = QQ" is 
the projection matrix onto the column space of Q. 
9 Orthonormal vectors are automatically linearly independent. 
(a) Vector proof: When c,q1 +c2q2 +c3q3 = 0, what dot product leads to cy = 0? 
Similarly c2 = 0 and c3 = 0 and the q’s are independent. 
(b) Matrix proof: Show that Qx = 0 leads to x = 0. Since Q may be rectangular, 
you can use QT but not Q=!. 
10 (a) Find orthonormal vectors qı and q2 in the plane of a = (1,3,4,5,7) and 
b = (—6, 6, 8, 0, 8). 
(b) Which vector in this plane is closest to (1, 0, 0, 0, 0)? 


11 If q; and q2 are orthonormal vectors in R’, what combination qı + q2 
is closest to a given vector b? 
12 If aj, a, a3 is a basis for RÌ, any vector b can be written as 
Xj 
b = xia; + x9a2 + x343 or ai a2 a3 xo | =b. 
X3 
(a) Suppose the a’s are orthonormal. Show that x; = afb. 
(b) Suppose the a’s are orthogonal. Show that x; = alb / at ay. 
(c) If the a’s are independent, x, is the first component of times b. 


Problems 13-24 are about the Gram-Schmidt process and A = QR. 


13 What multiple of a = [ i ] should be subtracted from b = [4] to make the result B 
orthogonal to a? Sketch a figure to show a, b, and B. 
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14 


15 


16 


17 


18 


19 


20 
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Complete the Gram-Schmidt process in Problem 13 by computing qı = a/||a|| and 
q2 = B/||B || and factoring into Q R: 


[i o| =a ello ven 


(a) Find orthonormal vectors q1, q2, q3 such that q1, q2 span-the column space of 


i 1l 
A=| 2 -1 
2 4 


(b) Which of the four fundamental subspaces contains q3? 


(c) Solve Ax = (1, 2, 7) by least squares. 


What multiple of a = (4,5, 2, 2) is closest to b = (1, 2,0, 0)? Find orthonormal 
vectors qı and q2 in the plane of a and b. 


Find the projection of b onto the line through a: 


1 1 
a=|1 and b= |3 and p=? and e=b-p=? 
5 


Compute the orthonormal vectors qı = a/||a|| and q2 = e/|lell. 


If A= OR then ATA = RTR = triangular times triangular. Gram- 
Schmidt on A corresponds to elimination on ATA. Compare the pivots for A! A with 
lal? = 3 and |e||? = 8 in Problem 17: 


1 1 
3 9 
Res Ta ck 
A=| 1 : and a=|, a 


True or false (give an example in either case): 


(a) The inverse of an orthogonal matrix is an orthogonal matrix. 


(b) If Q (3 by 2) has orthonormal columns then || Qx || always equals ||x ||. 


Find an orthonormal basis for the column space of A: 


1 —2 —4 
1 0 —3 
A= i i and b = 3 
1 3 0 


Then compute the projection of b onto that column space. 


21 


22 


23 


24 
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Find orthogonal vectors A, B, C by Gram-Schmidt from 


1 l 1 
a=|1l and b= |—l1 and c=] 0 
2 0 4 


Find q1, q2, q3 (orthonormal) as combinations of a, b, c (independent columns). 
Then write A as QR: 


1 2 4 
A=|0 0 5 
0 3 6 


(a) Find a basis for the subspace S in R* spanned by all solutions of 
xı + x2 +%3— x4 = 0. 
(b) Find a basis for the orthogonal complement S+, 
(c) Find bı in S and bz in S+ so that bj + bz = b = (1, 1, 1, 1). 
If ad — bc > Q0, the entries in A = OR are 
a —c|[a*+c* ab+cd 
bk | c a 0 ad — bc 
Sal saene re 


Write down A = QR when a,b,c,d = 2, 1, 1,1 and also 1, 1, 1,1. Which entry 
becomes zero when Gram-Schmidt breaks down? 


Problems 25-28 are about the QR code in equations (11-12). It executes Gram- 
Schmidt. 


25 
26 


27 


28 


Show why C (found via C™* in the steps after (4.36)) is equal to C in equation (4.32). 


Equation (4.32) subtracts from c its components along A and B. Why not subtract 
the components along a and along b? 


Write a working code and apply it to a = (2, 2, —1), b = (0, —3, 3), c = (1,0, 0). 
What are the q’s? 


Where are the mn? multiplications in equations (4.35) and (4.36)? 


Problems 29-32 involve orthogonal matrices that are special. 


29 


(a) Choose c so that Q is an orthogonal matrix: 


Eel ai ai 
a Tera 
Celie a T i 
aeoe e | 


(b) Change the first row and column to all 1’s and fill in another orthogonal matrix. 
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30 


31 


32 
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Project b = (1, 1, 1, 1) onto the first column in Problem 29(a). Then project b onto 
the plane of the first two columns. 


If u is a unit vector, then Q = I — 2uu! is a reflection matrix (Example 4.13). 
Find Q from u = (0, 1) and also from u = (0, /2/2, /2/2). Draw figures to show 
reflections of (x, y) and (x, y, Z). 


Q = I — 2uu" is a reflection matrix when ulu = 1. 


(a) Prove that Qu = —u. The mirror is perpendicular to u. 


(b) Find Qv when u!v = 0. The mirror contains v. 


DETERMINANTS 


5.1 The Properties of Determinants 


The determinant of a square matrix is a single number. That number contains an amazing 
amount of information about the matrix. It tells immediately whether the matrix is invert- 
ible. The determinant is zero when the matrix has no inverse. When A is invertible, the 
determinant of A~! is 1/(det A). If det A = 2 then detA~! = 3. In fact the determinant 
leads to a formula for every entry in A~!. 

This is one use for determinants—to find formulas for inverses and pivots and solu- 
tions A~!b. For a matrix of numbers, we seldom use those formulas. (Or rather, we use 
elimination as the quickest way to evaluate the answer.) For a matrix with entries a, b, c, d, 
its determinant shows how A`! changes as A changes: 


S a b ; = 1 d —b 
ale i has inverse A -zli if (5.1) 


Multiply those matrices to get 7. The determinant of A is ad — bc. When det A = 0, we 
are asked to divide by zero and we can’t—then A has no inverse. (The rows are parallel 
when a/c = b/d. This gives ad = bc and a zero determinant.) Dependent rows lead to 
det A = 0. 

There is also a connection to the pivots, which are a and d — (c/a)b. The product of 
the two pivots is the determinant: 


a(d = =b) Sad ebe Ahehe EA, 
a 


After a row exchange the pivots are c and b — (a/c)d. Those pivots multiply to give minus 
the determinant. 


Looking ahead The determinant of an n by n matrix can be found in three ways: 
1 Multiply the n pivots (times 1 or —1). 
2 Add up n! terms (times 1 or —1). 


3 Combine n smaller determinants (times 1 or —1). 
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You see that plus or minus signs—the decisions between 1 and —1—-play a very big part 
in determinants. That comes from the following rule: 


The determinant changes sign when two rows (or two columns) are exchanged. 


The identity matrix has determinant +1. Exchange two rows and det P = —1. Exchange 
two more rows and the new permutation has det P = +1. Half of all permutations are even 
(det P = 1) and half are odd (det P = —1). Starting from Z, half of the P’s involve an 
even number of exchanges and half require an odd number. In the 2 by 2 case, ad has a 
plus sign and bc has minus—coming from the row exchange: 


1 0 0 1 
aer a and aer |=- 


The other essential rule is linearity—but a warning comes first. Linearity does not 
mean that det(A + B) = det A+det B. This is absolutely false. That kind of linearity is not 
even true when A = I and B = I. The false rule would say that det27 = 1 + 1 = 2. The 
true rule is det27 = 2”. Determinants are multiplied by t” (not just by t) when matrices 
are multiplied by t. But we are getting ahead of ourselves. In the choice between defining 
the determinant by its properties or its formulas, we choose its properties—sign reversal 
and linearity. The properties are simple (Section 5.1). They prepare for the formulas 
(Section 5.2). Then come the applications, including these three: 


(A1) Determinants give A`! and A~!b (by Cramer’s Rule). 


(A2) The volume of an n-dimensional box is | det A|, when the edges of the box come 
from the rows of A. 


(A3) The numbers A for which det(A — AJ) = O are the eigenvalues of A. This is the most 
important application and it fills Chapter 6. 


The Properties of the Determinant 


There are three basic properties (rules 1, 2, 3). By using them we can compute the determi- 
nant of any square matrix A. This number is written in two ways, det A and |A|. Notice: 
Brackets for the matrix, straight bars for its determinant. When A is a 2 by 2 matrix, the 
three properties lead to the answer we expect: 


a b 


d 


The determinant of f 7 | is 
c d 


|= ad ~ be 


We will check each rule against this 2 by 2 formula, but do not forget: The rules apply to 
any n by n matrix. When we prove that properties 4-10 follow from 1-3, the proof must 
apply to all square matrices. 
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1 The determinant is a linear function of the first row (when the other rows stay fixed). 
If the first row is multiplied by t, the determinant is multiplied by t¢. If first rows are added, 
determinants are added. This rule only applies when the other rows do not change! Notice 
how [c d ] stays the same: 


l = jita tb a b 
multiply row 1 by t: d - A 
/ / / / 
aoto Aoro Ae ee eee ile N, 
c d c d c d 


In the first case, both sides are tad — tbc. Then t factors out. In the second case, both sides 
are ad + a'd — bc — b'c. These rules still apply when A is n by n, and the last n — 1 rows 
don’t change. May we emphasize this with numbers: 


t 0 0 1 00 l 2 3 1 0 O O 2 3 
O 1 O;=t/0 1 O| and JO 1 OJ =/0 1 OF + )0 1 O 
0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 


By itself, rule 1 does not say what any of those determinants are. It just says that the 
determinants must pass these two tests for linearity. 

Combining multiplication and addition, we get any linear combination in the first 
row: t(row 1 of A) + t’ (row 1 of A’). With this combined row, the determinant is ¢ times 
det A plus t’ times det A’. Eventually the other rows have to get into the picture. Rule 2 
does that. 


2 The determinant changes sign when two rows are exchanged (sign reversal): 


a b 


c d 
heck: = — 
Chec p d d 


(both sides equal bc — ad). 


Because of this rule, there is nothing special about row 1. The determinant is a linear 
function of each row separately. If row 2 is multiplied by f¢, so is det A. 


Proof Exchange rows 1 and 2. Then multiply the new row 1 by t. Then exchange back. 
The determinant is multiplied by —1 then ¢ then —1, in other words by t—as we expected. 


To operate on any row we can exchange it with the first row—then use rule 1 for the first 
row and exchange back. This still does not mean that det2/ = 2det/. To obtain 27 we 
have to multiply both rows by 2, and the factor 2 comes out both times: 


t 

2 0 4 j 

= = == ae 
l 4 2 4 and 


This is just like area and volume. Expand a rectangle by 2 and its area increases by 4. 
Expand an n-dimensional box by ¢ and its volume increases by t”. The connection is no 
accident—we will see how determinants equal volumes. 
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Property 3 (the easiest rule) matches the determinant of J with the volume of a unit cube. 


3 The determinant of the n by n identity matrix is 1. 


Rules 1 and 2 left a scaling constant undecided. Rule 3 decides it by det J = 1. 

Pay special attention to rules 1-3. They completely determine the number det A— 
but for a big matrix that fact is not obvious. We could stop here to find a formula for n by n 
determinants. It would be a little complicated—we prefer to go gradually. Instead we write 
down other properties which follow directly from the first three. These extra rules make 
determinants much easier to work with. 


4 Iftworows of A are equal, then det A = 0. 


a b 


Check 2 by 2: p 


=0 


Rule 4 follows from rule 2. (Remember we must use the rules and not the 2 by 2 formula.) 
Exchange the two equal rows. The determinant D is supposed to change sign. But also D 
has to stay the same, because the matrix is not changed. The only number with —D = D 
is D = O—this must be the determinant. (Note: In Boolean algebra the reasoning fails, 
because —1 = 1. Then D is defined by rules 1, 3, 4.) 

A matrix with two equal rows has no inverse. Rule 4 makes det A = 0. But matrices 
can be singular and determinants can be zero without having equal rows! Rule 5 will be 
the key. 


5 Subtracting a multiple of one row from another row leaves det A unchanged. 


a b 
c—la d—lb 


a b 
c d| 
Linearity splits the left side into the right side plus another term —/| 4 ? |. This extra term 


is zero by rule 4. Therefore rule 5 is correct. Note how only one row changes while the 
others stay the same—as required by rule 1. 


Conclusion The determinant is not changed by the usual elimination steps: det A equals 
det U. If we can find determinants of triangular matrices U, we can find determinants of 
all matrices A. Every row exchange reverses the sign, so always det A = tdetU. We 
have narrowed the problem to triangular matrices. 


6 A matrix with a row of zeros has det A = 0. 


0 0 
c d 


|=0 and 


For an easy proof, add some other row to the zero row. The determinant is not changed 
(rule 5). But the matrix now has two equal rows. So det A = 0 by rule 4. 
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7 If Aisa triangular matrix then det A = a11a22 ° > ° ann =product of diagonal en- 
tries. 


a b 


a 
0 a = ad and also Į 


Suppose all diagonal entries of A are nonzero. Eliminate the off-diagonal entries by the 
usual steps. (If A is lower triangular, subtract multiples of each row from lower rows. If A 
is upper triangular, subtract from rows above.) By rule 5 the determinant is not changed— 
and now the matrix is diagonal: 


Qi 0 
We must still prove that = A114272 ` ` ` Ann. 
0 ünü 


For this we apply rules 1 and 3. Factor aj; from the first row. Then factor a22 from the 
second row. Eventually factor ann from the last row. The determinant is a; times a22 
times --- times ann times det J. Then rule 3 (used at last!) is det 7 = 1. 

What if a diagonal entry of the triangular matrix is zero? Then the matrix is singular. 
Elimination will produce a zero row. By rule 5 the determinant is unchanged, and by 
rule 6 a zero row means det A = 0. Thus rule 7 is proved—the determinants of triangular 
matrices come directly from their main diagonals. 


8 If Ais singular then det A = 0. If A is invertible then det A Æ 0. 


p d is singular if and only if ad — bc = 0. 


Proof By elimination go from A to U. If A is singular then U has a zero row. The rules 
give det A = detU = 0. If A is invertible then U has nonzeros (the pivots) along its 
diagonal. The product of nonzero pivots (using rule 7) gives a nonzero determinant: 


det A = +det U = + (product of the pivots). 


This is the first formula for the determinant. A computer would use it to find det A 
from the pivots. The plus or minus sign depends on whether the number of row exchanges 
is even or odd. In other words, +1 or —1 is the determinant of the permutation matrix P 
that does the row exchanges. With no row exchanges, the number zero is even and P = I 
and det A = det U. Note that always det L = 1, because L is triangular with 1’s on the 
diagonal. What we have is this: 


If PA=LU then detP detA = detLdetU. (5.2) 


Again, det P = +1 and det A = + det U. Equation (5.2) is our first case of rule 9. 
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9 The determinant of AB is the product of the separate determinants: |AB| = |A| |B|. 


a b 
c a 


ap+br aq+bs 
cp+dr cq+ds|- 


P q 
r s 


In particular—when the matrix B is A~!—the determinant of A~! is 1/ det A: 
AAT! =I so (det A)(detA~!) = det/ = 1. 
This product rule is the most intricate so far. We could check the 2 by 2 case by algebra: 
(ad — bc)(ps — qr) = (ap + br)(cq + ds) — (aq + bs)(cp + dr). 


For the n by n case, here is a snappy proof that |AB| = |A| |B|. The idea is to consider 
the ratio D(A) = |AB|/|B|. If this ratio has properties 1-3—-which we now check—it 
must equal the determinant |A]. (The case |B| = 0 is separate and easy, because AB is 
singular when B is singular. The rule |AB| = |A| |B| becomes 0 = 0.) Here are the three 
properties of the ratio |AB|/|B|: 


Property 3 (Determinant of 7): If A = I then the ratio becomes |B|/|B| = 1. 


Property 2 (Sign reversal): When two rows of A are exchanged, so are the same two rows 
of AB. Therefore |AB| changes sign and so does the ratio |A B|/|B]. 


Property 1 (Linearity): When row 1 of A is multiplied by 7, so is row 1 of AB. This 
multiplies |AB| by ¢ and multiplies the ratio by t—as desired. Now suppose row 1 of A is 
added to row 1 of A’ (the other rows staying the same throughout). Then row 1 of AB is 
added to row 1 of A'B. By rule 1, the determinants add. After dividing by |B], the ratios 
add. 


Conclusion This ratio |AB|/|B| has the same three properties that define |A|. Therefore 
it equals |A|. This proves the product rule |AB| = |A| |B|. 


10 The transpose A‘ has the same determinant as A. 


Check: |° j = |7 “| since both sides equal ad — bc. 
d ba 
The equation JAT] = |A| becomes 0 = 0 when A is singular (we know that A! is also 


singular). Otherwise A has the usual factorization PA = LU. Transposing both sides 
gives ATPT = UTL". The proof of |A| = |A!| comes by using rule 9 for products and 
comparing: 


det P det A =detLdetU and detA! det PT = det UT det LT. 


First, det L = det LT (both have 1’s on the diagonal). Second, det U = det U T (transposing 
leaves the main diagonal unchanged, and triangular determinants only involve that diago- 
nal). Third, det P = det PT (permutations have PT = P7}, so |P||PT] = 1 by rule 9; 
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thus |P| and |PT] both equal 1 or both equal —1). Fourth and finally, the comparison 
proves that det A equals det AT. 


Important comment Rule 10 practically doubles our list of properties. Every rule for the 
rows can apply also to the columns (just by transposing, since |A| = |A‘|). The determi- 
nant changes sign when two columns are exchanged. A zero column or two equal columns 
will make the determinant zero. If a column is multiplied by t, so is the determinant. The 
determinant is a linear function of each column separately. 


It is time to stop. The list of properties is long enough. Next we find and use an 
explicit formula for the determinant. 


Problem Set 5.1 


Questions 1—12 are about the rules for determinants. 


1 If a 4 by 4 matrix has detA = 2, find det(2A) and det(—A) and det(A?) and 
det(A~!). 


2 If a 3 by 3 matrix has detA = —3, find det(5A) and det(—A) and det(A?) and 
det(A~'). 


3 True or false, with reason or counterexample: 


(a) The determinant of I + Ais 1 + det A. 
(b) The determinant of ABC is |A| |B] IC]. 
(c) The determinant of A‘ is |A|*. 
(d) The determinant of 4A is 4|Al. 


4 Which row exchanges show that these “reverse identity matrices” J3 and J4 have 
| J3| = —1 but |J4| = +1? 


0 
det | 0 
1 


or O O 
>O O- © 
O O © | 


5 For n = 5,6,7, count the row exchanges to permute the reverse identity J, to the 
identity matrix J,. Propose a rule for every size n and predict whether Jo; is even 
or odd. 


6 Show how Rule 6 (determinant = 0 if a row is all zero) comes from Rule 1. 


7 Prove from the product rule |AB| = |A| |B| that an orthogonal matrix Q has deter- 
minant 1 or —1. Also prove that |Q| = |Q7'| =|Q"). 
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8 Find the determinants of rotations and reflections: 


0 = cos — sin mi O= 1—2cos*@ —2cosé@siné 
~ | sin@  cos@8 ~ |—2cos@sind 1-—2sin?6 |" 


9 Prove that |A!| = | A] by transposing A = QR. (R is triangular and Q is orthogonal; 
note Problem 7.) Why does |RT] = |R|? . 


10 Ifthe entries in every row of A add to zero, prove that det A = 0. If every row of A 
adds to one, prove that det(A — J) = 0. Does this guarantee that det A = 1? 


11 Suppose that CD = —DC and find the flaw in this reasoning: Taking determinants 
gives |C||D| = —|D||C|. Therefore |C| = 0 or |D| = 0. One or both of the 


matrices must be singular. (That is not true.) 


12 The inverse of a 2 by 2 matrix seems to have determinant = 1: 


~ ad—be | 


Hepa! = det 1 l d A _ ad = bc 


ad—bc|-—c a 
What is wrong with this calculation? 
Questions 13-26 use the rules to compute specific determinants. 


13 By applying row operations to produce an upper triangular U , compute 


12 3 0 2 —-1 0 0 
2 6 6 1 —] 2 -1 0 
det 100 3 and det 0] > 1 
0 2 0 5 0 O -!I 2 


14 Use row operations to show that the 3 by 3 “Vandermonde determinant’ is 


a a 
det} 1 b b? | =(b-—a)(c—a)(c—b). 
lc œ 


15 Find the determinants of a rank one matrix and a skew-symmetric matrix: 


l 0 1 3 
A=|2]/[1 -4 5| and K=ļ|-1 0 4 
3 —3 —4 0 
16 A skew-symmetric matrix has K T — —Ķ. Insert a,b,c for 1,3,4 in Question 15 


and show that |Ķ | = 0. Write down a 4 by 4 example with |K| = 1. 


17 


18 


19 


20 


21 


22 


23 


24 


25 
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Use row operations to simplify and compute these determinants: 


101 201 301 1 t Ê 
det} 102 202 302 and det | t 
103 203 303 t t 


Find the determinants of U and UT! and U?: 


t 233 ai 
U=1I|0 4 5 and ma J 
0 0 6 


Suppose you do two row operations at once, going from 

a b a a—Lc b-—Ld 

c d c—la d-—-libj| 
Find the second determinant. Does it equal ad — bc? 


Add row 1 of A to row 2, then subtract row 2 from row 1. Then add row 1 to row 2 
and multiply row 1 by —1 to reach B. Which rules show that 


isl d pals Ol ae epee 
c d a b 


Those rules could replace Rule 2 in the definition of the determinant. 


From ad — bc, find the determinants of A and A~! and A — ÀI: 


2 1 ıı 1f 2 -!1 2-—xz 1 
a=|i j and A =3| J and a-u =| j a 
Which two numbers à lead to det(A — AJ) = 0? Write down the matrix A — àI for 


each of those numbers A—it should not be invertible. 


From A = [4 } | find A? and A`! and A — AJ and their determinants. Which two 
numbers A lead to |A —AJ| = 0? 


Elimination reduces A to U. Then A = LU: 


3 3 4 1 0 Ọ 3 3 4 
A=| 6 8 7|=| 2 1 OF;O 2 -1}]=LU. 
=3 5 -9 —] 4 1 0 0 -i 


Find the determinants of L, U, A, UT! L-!, and UTIL! A. 
If the i, j entry of A is i times j, show that det A = 0. (Exception when A = [1 ].) 


If the i, j entry of A isi + j, show that det A = 0. (Exception when n = 1 or 2.) 
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26 Compute the determinants of these matrices by row operations: 


0 a 00 
0a 0 00 b 0 aaa 
A=1|0 0 b and B= and C=ļa b b 
0 0 E abe 

i d 000 


27 True or false (give a reason if true or a 2 by 2 example 1f false): 


(a) IfA is not invertible then AB is not invertible. 
(b) The determinant of A is the product of its pivots. 
(c) The determinant of A — B equals det A — det B. 


(d) AB and BA have the same determinant. 


28 (Calculus question) Show that the partial derivatives of f(A) = In(det A) give Aq}! 


_ af/aa af/ac] ,1 
f(a, b, c,d) = In(ad — bc) leads to kee mel 


5.2 Cramer’s Rule, Inverses, and Volumes 


This section is about the applications of determinants, first to Ax = b and A~!. In the 
entries of A~!, you will see det A in every denominator—we divide by it. (If it is zero then 
A`! doesn’t exist.) Each number in A`! is a determinant divided by a determinant. So is 
every component of x = AT!b. 


Start with Cramer’s Rule to find x. A neat idea gives the solution immediately. Put x in 
column 1 of J and call that matrix Z;. When you multiply A times Z1, the first column 
becomes Ax which is b: 


X1 0 0 
AZ, = A x. 1 O|] =| b2 an az |= Bi. (5.3) 
x3 0 1 b3 a32 a33 


We multiplied A times Z; a column at a time. The first column is Ax, the other columns 
are just copied from A. Now take determinants. The rule is (det A) (det Z1) = det Bı. But 
Z, is triangular with determinant x1: 


= det B, 


(det A) (x1) = det B1 or i= ree 


(5.4) 


This is the first component of x in Cramer’s Rule! Changing the first column of A pro- 
duced B4. 
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To find x2, put the vector x into the second column of the identity matrix. This new 
matrix is Z2 and its determinant is x2. Now multiply Z2 by A: 


1 x, O 
AZ2= |a @ @ O x O}]=laq, b a | = Bo. (5.5) 
O x3 1 


In Z2, the vector x replaced a column of J. So in Bo, the vector b replaced a column of A. 
Take determinants to find (det A)(det Z2) = det B2. The determinant of Z2 is x2. This 
gives x2 in Cramer’s Rule: 


5B (Cramer’s Rule) If det A is not zero, then Ax = b has the unique solution 


det By det B2 det Bn 
= PE = n as Aa . 
det A det A det A 


The matrix B; is obtained by replacing the jth column of A by the vector b. 


X] 


The computer program for Cramer’s rule only needs one formula line: 
x(j) = det([A(:, 1:j;-1) b AC, j + 1:n)/det(A) 


To solve an n by n system, Cramer’s Rule evaluates n + 1 determinants. When each 
one is the sum of n! terms—applying the “big formula” with all permutations—this makes 
(n + 1)! different terms. It would be crazy to solve equations that way. But we do finally 
have an explicit formula for the solution x. 


Example 5.1 Use Cramer’s Rule (it needs four determinants) to solve 


xy +x2+x3 = 1 
—2x1+x2 =0 
—4xı +x3 = Q. 


The first determinant is |A|. It should not be zero. Then the right side (1, 0, 0) goes into 
columns 1, 2, 3 to produce the matrices B1, B2, B3: 


111 111 
\A|=|—2 1 0|/=7 and |Bıj=l0 1 O|=1 and 
40 1 001 
111 111 
\By} =|—2 0 O|=2 and |B3)=|/-2 1 0O| =4. 
40 1 —4 0 0 


Cramer’s Rule takes ratios to find the components of x. Always divide by | A]: 


ye Se eee EON ele 


x, = = = = = iS: 
1 TAL 7 a a F > Tay 7 


This example will be used again, to find the inverse matrix A~!. We always substitute 
the x’s back into the equations, to check the calculations. 
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A Formula for A7! 


In Example 1, the right side b was special. It was (1, 0, 0), the first column of I. The 
solution x must be the first column of A7!: 


1/7 1 1/7 1 0 0 
A| 2/7] =10 is the first column of Aļ|2/7 ...;=]0 1 O}. (5.6) 
4/7 0 AJT 10 O0 1 


The number 7 is the 3 by 3 determinant |A|. The numbers 1, 2, 4 are cofactors (one size 
smaller). Look back at |B,| to see how it reduces to a 2 by 2 determinant: 


{\Bi]=1 isthecofactor Cı = f A 
, —2 0 
|B2| =2_ isthe cofactor Cj. = — Lae a 
—2 1 
|\B3)= 4 isthe cofactor C13 = E i 


Main point: The numerators in A`} are cofactors. When b = (1,0, 0) replaces a col- 
umn of A to produce Bj, the determinant is just 1 times a cofactor. 


For the second column of A~!, change b to (0, 1, 0). Watch how the determinants of 
B,, B2, B3 are cofactors along row 2—including the signs (—)(+)(—): 


0 1 1 1 0 1 l 1 0 
1 1 Oļ=—1 and |—2 1 O|}/=5 and |-2 1 1| = —4. 
0 0 1 —4 0 1 —4 0 0Ọ0 
With these numbers —1, 5, —4 in Cramer’s Rule, the second column of AT! is —4, 2, -4 


For the third column of A~!, the right side is b = (0,0, 1). The three determinants 
become cofactors along the third row. The numbers are —1, —2, 3. We always divide by 
|A| = 7. Now we have all columns of A7~!: 


1 1 1/7 —1/7 —1/7 
A=]ļ|-2 1 0 times A`! = 2/1. afl =2/1 equals Z. 
—4 0 1 4/7 —4/7 3/7 


Summary We found AT! by solving AA~! = J. The three columns of 7 on the right 
led to the columns of AT! on the left. After stating a short formula for A~!, we will give 
a direct proof. Then you have two ways to approach A~!—by putting columns of J into 
Cramer’s Rule (above), or by cofactors (the lightning way using equation (5.10) below). 


5C (Formula for A~!) The i, j entry of A`! is the cofactor C ji divided by the determi- 
nant of A: 

Cii ja E CT 
det A ~ det A’ 


(AT) = (5.7) 
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The cofactors C;; go into the “cofactor matrix” C. Then this matrix is transposed. To 
compute the i, j entry of A~!, cross out row j and column i of A. Multiply the determinant 
by (—1)'+/ to get the cofactor, and divide by det A. 


Example 5.2 The matrix A = [2% | has cofactor matrix C = [_¢ ~¢ ]. Look at A times 
the transpose of C: 


T {a b d —b| |ad—bc 0 
at =|? A i Ja 0 ad — bc |` Oe) 


The matrix on the right is det A times 7. So divide by det A. Then A times C!/ det A is J, 
which reveals A7!: 


Ct 1 d —b 
ea, (ee ae bk 
A 1S ae which is STET É A l (5.9) 


This 2 by 2 example uses letters. The 3 by 3 example used numbers. Inverting a 4 by 4 
matrix would need sixteen cofactors (each one is a 3 by 3 determinant). Elimination is 
faster—but now we know an explicit formula for A~!. 


Direct proof of the formula A~! = CT/ det A The idea is to multiply A times CT: 


ayy a2 a3] ] Ci Ca C31 detA 0 0 
a2, an ar Cyn Cor Cz | = 0 det A 0 : (5.10) 
a31 a32 a33] LC13 C23 C33 0 QO detA 


Row 1 of A times column 1 of the cofactors yields det A on the right: 
By cofactors of row 1: a11C 1 + a12C12 + a13C13 = det A. 


Similarly row 2 of A times column 2 of C! yields det A. The entries a2 j are multiplying 
cofactors C2; as they should. 


Reason for zeros off the main diagonal in equation (5.10). Rows of A are combined 
with cofactors from different rows. Row 2 of A times column 1 of C! gives zero, but why? 


a21C11 +.a22C 12 + a23C13 = 0. (5.11) 


Answer: This is the determinant of a matrix A* which has two equal rows. A* is the 
same as A, except that its first row is a copy of its second row. So det A* = 0, which is 
equation (5.11). It is the expansion of det A* along row 1, where A* has the same cofactors 
C11, C12, C13 as A—because all rows agree after the first row. Thus the remarkable matrix 
multiplication (5.10) 1s correct. 

Equation (5.10) immediately gives A~!. We have det A times J on the right side: 


CT 


AC! = (det A)I a a 
(Sot m det A 


The 2 by 2 case in equation (5.8) shows the zeros off the main diagonal of AC oe 
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Example 5.3 A triangular matrix of 1’s has determinant 1. The inverse matrix contains 
cofactors: 


1 0 0 0 l 0 0 0 
1100 , ec {-1 1 og 
A= 1110 has inverse A = T = 0 1] i 0 
1 1 1 1 0 0 —] 1 


Cross out row 1 and column 1 of A to see the cofactor C1; = 1. Now cross out row 1 and 
column 2. The 3 by 3 submatrix is still triangular with determinant 1. But the cofactor C12 
is —1 because of the sign (—1)!+?. This number —1 goes into the (2, 1) entry of A~!— 
don’t forget to transpose C! 

Crossing out row 1 and column 3 leaves a matrix with two columns of 1’s. Its 
determinant is C13 = 0. This is the (3, 1) entry of A-!. The (4, 1) entry is zero for 
the same reason. The (1, 4) entry of AT! comes from the (4, 1) cofactor—which has a 
column of zeros. 

The inverse of a triangular matrix is triangular. Cofactors explain why. 


Example 5.4 If all cofactors are nonzero, is A sure to be invertible? No way. 


Example 5.5 Here is part of a direct computation of A~! by cofactors: 


0 1 3 \A| =5 x x x 
A=|1 0 1] and Cyy=-(-2) and AT =/]% -$ x 
2 1 O C = —6 Xi Xx - ox 


EIGENVALUES AND 
EIGENVECTORS 


6.1 Introduction to Eigenvalues 


Linear equations Ax = b come from steady state problems. Eigenvalues have their great- 
est importance in dynamic problems. The solution is changing with time—growing or 
decaying or oscillating. We can’t find it by elimination. This chapter enters a new part of 
linear algebra. As for determinants, all matrices are square. 


A good dynamical model is the sequence of matrices A, A”, A?, A*,.... Suppose you 
need the hundredth power A!°°. The underlying equation multiplies by A at every step 
(and 99 steps is not many in scientific computing). The starting matrix A becomes unrec- 
ognizable after a few steps: 


9 3 84 .48 804 .588 7824 .6528 75.75 
T 7 .16 .52 .196 .412 .2176 .3472 23: A29 
A A2 A? A4 A100 
A100 was found by using the eigenvalues of A, not by multiplying 100 matrices. 


To explain eigenvalues, we first explain eigenvectors. Almost all vectors, when multiplied 
by A, change direction. Certain exceptional vectors x are in the same direction as Ax. 
Those are the “eigenvectors.” Multiply an eigenvector by A, and the vector Ax is a num- 
ber A times the original x. 

The basic equation is Ax = Ax. The number A is the “eigenvalue”. It tells whether 
the special vector x is stretched or shrunk or reversed or left unchanged—when it is multi- 
plied by A. We may find A = 2 or 5 or —1 or 1. The eigenvalue can be zero. The equation 
Ax = Ox means that this eigenvector x is in the nullspace. 

If A is the identity matrix, every vector is unchanged. All vectors are eigenvectors 
for A = I. The eigenvalue (the number lambda) is à = 1. This is unusual to say the least. 
Most 2 by 2 matrices have two eigenvector directions and two eigenvalues. This section 
teaches how to compute the x’s and A’s. 
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75 

Ax, =x -| 
1 l 25 Ax, = (1)2x, 
A2x, = (.6x, =| 3° 
e Oe Ss 


Figure 6.1 The eigenvectors keep their directions. The eigenvalues for A? are à. 


For the matrix above, here are eigenvectors xı and x2. Multiplying those vectors 
by A gives x; and .6x2. The eigenvalues à are 1 and .6: 


75 9 3) .75 
Ss re and Ax; = a J A =x; (Ax = x means that A; = 1) 


1 9 3 1 6 ae 
n= and m= J EE (this is .6 x2 so à2 = .6). 


If we multiply xı by A again, we still get x1. Every power of A will give A” x; = x4. 
Multiplying x2 by A gave .6x2, and if we multiply again we get (.6)*x2. When A is 
squared, the eigenvectors xı and x2 stay the same. The i’s are now 1? and (.6)2. The 
eigenvalues are squared! This pattern keeps going, because the eigenvectors stay in their 
own directions and never get mixed. The eigenvectors of A! are the same x; and x2. The 
eigenvalues of A! are 11% = 1 and (.6)!© = very small number. 

Other vectors do change direction. But those other vectors are combinations of the 
two eigenvectors. The first column of A is 


.9 .75 AS 
HEERE aTe en 


Multiplying by A gives the first column of A’. Do it separately for xı and .15 x9: 


A i = ke is really Axı +.15 Axo = x; + (.15)(.6)x2. 


Each eigenvector is multiplied by its eigenvalue, whenever A is applied. We didn’t need 
these eigenvectors to find A*. But it is the good way to do 99 multiplications: 


9 75 ieee 
AP H is really 1x + (.15)6)? x2 = + | small 
l l vector 


This is the first column of A!%. The second column is A’ times column 2 of A. But 
column 2 is x; — .45 x2. Multiply by A” to get x; — (.45)(.6)??x2. Again (.6)”? is 
extremely small. So both columns of A! are practically equal to x; = (.75, .25). 
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The eigenvector xı is a “steady state” that doesn’t change (because A; = 1). The 
eigenvector x2 is a “decaying mode” that virtually disappears (because 42 = .6). The 
higher the power of A, the closer its columns to the steady state. 

We mention that this particular A is a Markov matrix. Its entries are positive and 
every column adds to 1. Those facts guarantee that the largest eigenvalue is à = 1 (as 
we found). The eigenvector for A = 1 is the steady state—which all columns of A* will 
approach. 


Example 6.1 The projection matrix P = [|2 2] has eigenvectors x; = (1, 1) and x2 = 
(1, —1). For those vectors, Px; = x; and Px2 = 0. This matrix illustrates three things at 
once: 


1 It is a Markov matrix, so A = 1 is an eigenvalue. 
2 It is a singular matrix, so A = 0 is an eigenvalue. 


3 It is a symmetric matrix, so xı and x2 are perpendicular. 


The only possible eigenvalues of a projection matrix are 0 and 1. The eigenvectors for 
à = 0 (which means Px = Ox) fill up the nullspace. The eigenvectors for A = 1 (which 
means Px = x) fill up the column space. The nullspace goes to zero. Vectors in the 
column space are projected onto themselves. An in-between vector like v = (3, 1) partly 
disappears and partly stays: 


LJE e RE oe [E SEI 


The projection P keeps the column space part and destroys the nullspace part, for 
every v. To emphasize: Special properties of a matrix lead to special eigenvalues and 
eigenvectors. That is a major theme of this chapter. Here à = 0 and 1. The next matrix is 
also special. 


Example 6.2 The reflection matrix R = [9 } ] has eigenvalues 1 and —1. The first eigen- 
vector is (1, 1)—unchanged by R. The second eigenvector is (1, —1)—its signs are re- 
versed by R. A matrix with no negative entries can still have a negative eigenvalue! The 
perpendicular eigenvectors are the same x; and x2 that we found for projection. Behind 
that fact is a relation between the matrices R and P: 


ae 1 0 O | 
2P—I=R or 213 3\-lo = {4 at (6.2) 


Here is the point. If Px = Ax then 2Px = 2Ax. The eigenvalues are doubled when the 
matrix is doubled. Now subtract Zx = x. The result is (2P — J)x = (21 — 1)x. When a 
matrix is shifted by I, each à is shifted by 1. The eigenvectors stay the same. 


The eigenvalues are related exactly as the matrices are related: 
2(1)-1=1 
2P—I=R so the eigenvalues of R are 
2(0) — 1 = —1. 


Similarly the eigenvalues of R? are 2. In this case R? = I and (1)? = | and (—1) =a oe 
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Px, =x a 
Xk 1 1 X% Rx, =x, 
xX N x XN 
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\ n 
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Projection Reflection 
onto line across line N 
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Figure 6.2 Projections have à = 1 and0. Reflections have A = 1 and —1. Perpendicular 
eigenvectors. 


This example is a permutation as well as a reflection. We used the letter R instead 
of P for two reasons. First, P was already taken by the projection. Second, other reflec- 
tions have à = —1, but many permutations don’t. The eigenvectors with 4 = 1 are their 
own reflection. The eigenvectors with à = —1 go “through the mirror’—they are reversed 
to the other side. The next example is a rotation, and it brings bad news. 


Example 6.3 The 90° rotation turns every vector so Q = [_? 1 | has no real eigenvectors 
or eigenvalues. 

This matrix Q rotates every vector in the xy plane. No vector stays in the same 
direction (except the zero vector which is useless). There cannot be an eigenvector, unless 
we go to Imaginary numbers. Which we do. 

To see how i can help, look at Q. If Q is rotation through 90°, then Q? is rotation 
through 180°. In other words Q? is —J. Its eigenvalues are —1 and —1. (Certainly — Ix = 


—1x.) If squaring Q is supposed to square its eigenvalues A, we must have à? = —1. The 
eigenvalues of the 90° rotation matrix Q are +i and —i. 
We meet the imaginary number i (with i? = —1) also in the eigenvectors of Q: 


i olei] Lt llii 


Somehow these complex vectors keep their direction as they are rotated. Don’t ask how. 
This example makes the all-important point that real matrices can have complex eigenval- 
ues. It also illustrates two special properties: 


1 Q is a skew-symmetric matrix so each À is pure imaginary. 
2 Q is an orthogonal matrix so the absolute value of A is 1. 


A symmetric matrix (AT = A) can be compared to a real number. A skew-symmetric 
matrix (AT = —A) can be compared to an imaginary number. An orthogonal matrix 
(ATA = I) can be compared to a complex number with |A| = 1. For the eigenvalues those 
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are more than analogies—they are theorems to be proved. The eigenvectors for all these 
special matrices are perpendicular. 


The Equation for the Eigenvalues 


In those examples we could solve Ax = Ax by trial and error. Now we use determinants 
and linear algebra. This is the key calculation in the whole chapter—to compute the i’s 
and x’s. 

First move Ax to the left side. Write the equation Ax = Ax as (A — àI )x = 0. The 
matrix A — AJ times the eigenvector x is the zero vector. The eigenvectors make up the 
nullspace of A — AJ! When we know an eigenvalue À, we find an eigenvector by solving 
(A —Al)x = 0. 


Eigenvalues first. If (A — AJ)x = 0 has a nonzero solution, A — AJ is not invertible. The 
matrix A — AJ must be singular. The determinant of A — XI must be zero. This is how to 
recognize when A is an eigenvalue: 


6A The number A is an eigenvalue of A if and only if 


det(A — AI) =0. (6.3) 


This “characteristic equation” involves only À, not x. Then each root A leads to x: 
(A —AlI)x =0 or Ax = Ax. (6.4) 


Since det(A — AJ) = 0 is an equation of the nth degree, there are n eigenvalues. 
Example 6.4 A = [| 4 4] is already singular (zero determinant). Find its A’s and x’s. 
When A is singular, à = 0 is one of the eigenvalues. The equation Ax = Ox has solutions. 


They are the eigenvectors for à = 0. But to find all A’s and x’s, begin as always by 
subtracting AJ from A: 


Subtract à from the diagonal to find A—AJ = k A i A é A 


Take the determinant of this matrix. From 1 — i times 4 — i, the determinant involves 
2 — 5A +4. The other term, not containing A, is 2 times 2. Subtract as in ad — bc: 


ete 2 


er, ee = a E 
2 4— J = (1—A)(4—A) - (202) =" — 5d. (6.5) 


This determinant à? — 5A is zero when A is an eigenvalue. Factoring into À times A — 5, 
the two roots are à = 0 and A = 5: 


det(A — AI) = 17-50. =0 yields the eigenvalues A,;=0O and dA2=5. 
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Now find the eigenvectors. Solve (A — AJ)x = 0 separately for 4; = 0 and Az = 5: 


E 1 2|ļ]y| 10| . . y| | 2 = 
(A —Ol)x = p Ta yields the eigenvector FPL forà; = 0 


_|-4 2|jy| 0 ! : y] jl _ 
(A-spx=| 3 a HoN yields the eigenvector naa for A. = 5. 


We need to emphasize: There is nothing exceptional about à = 0. Like every other number, 
zero might be an eigenvalue and it might not. If A is singular, it is. The eigenvectors fill 
the nullspace: Ax = Ox = 0. If A is invertible, zero is not an eigenvalue. We shift A by a 
multiple of J to make it singular. In the example, the shifted matrix A — 5/ was singular 
and 5 was the other eigenvalue. 


Summary To solve the eigenvalue problem for an n by n matrix, follow these steps: 


1 Compute the determinant of A — XI. With à subtracted along the diagonal, this 
determinant starts with à” or —X”. It is a polynomial in A of degree n. 


2 Find the roots of this polynomial, so that det(A — AJ) = 0. The n roots are the n 
eigenvalues of A. 


3 For each eigenvalue solve the system (A — AJ)x = 0. 


Since the determinant is zero, there are solutions other than x = 0. Those x’s are the 
eigenvectors. 


A note on quick computations, when A is 2 by 2. The determinant of A — ÀI is a quadratic 
(starting with 47). By factoring or by the quadratic formula, we find its two roots (the 
eigenvalues). Then the eigenvectors come immediately from A — AJ. This matrix is sin- 
gular, so both rows are multiples of a vector (a, b). The eigenvector is any multiple of 
(b, —a). The example had à = 0 and A = 5: 


à = 0: rows of A — OJ in the direction (1, 2); eigenvector in the direction (2, —1) 
à = 5: rows of A — S/ in the direction (—4, 2); eigenvector in the direction (2, 4). 


Previously we wrote that last eigenvector as (1,2). Both (1,2) and (2,4) are correct. 
There is a whole line of eigenvectors—any nonzero multiple of x is as good as x. Often 
we divide by the length, to make the eigenvector into a unit vector. 

We end with a warning. Some 2 by 2 matrices have only one line of eigenvectors. 
Some n by n matrices don’t have n independent eigenvectors. This can only happen when 
two eigenvalues are equal. (On the other hand A = Z has equal eigenvalues and plenty of 
eigenvectors.) Without n eigenvectors, we don’t have a basis. We can’t write every v as a 
combination of eigenvectors. In the language of the next section, we can’t diagonalize the 
matrix. 
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Good News, Bad News 


Bad news first: If you add a row of A to another row, the eigenvalues usually change. 
Elimination destroys the 4’s. After elimination, the triangular U has its eigenvalues sitting 
along the diagonal—they are the pivots. But they are not the eigenvalues of A! Eigenvalues 
are changed when row 1 is added to row 2: 


I —i 


1 —l1 
A = = =: 7° = 
p | has A = 0 and à = 2; U f 0 


| ] | has à = O and À = 1. 


Good news second: The product i; times À2 and the sum ài + Az can be found 
quickly from the matrix A. Here the product is 0 times 2. That agrees with the determinant 
of A (which is 0). The sum of the eigenvalues is 0 + 2. That agrees with the sum down the 
main diagonal of A (which is 1 + 1). These quick checks always work: 


6B The product of the n eigenvalues equals the determinant of A. 


6C The sum of the n eigenvalues equals the sum of the n diagonal entries. This number 
is called the trace of A: 


ài +À2 +--+ Àn = trace = ai, +422 +--+ + Ann. (6.6) 


Those checks are very useful. They are proved in Problems 14-15 and again in the next 
section. They don’t remove the pain of computing the A’s. But when the computation is 
wrong, they generally tell us so. For the correct 4’s, go back to det(A — AJ) = 0. 

The determinant test makes the product of the A’s equal to the product of the pivots 
(assuming no row exchanges). But the sum of the 4’s is not the sum of the pivots—as the 
example showed. The individual 4’s have almost nothing to do with the individual pivots. 
In this new part of linear algebra, the key equation is really nonlinear: à multiplies x. 


Problem Set 6.1 


1 Find the eigenvalues and the eigenvectors of these two matrices: 
1 4 2A 
a=]; 4 and Atta) ‘|, 
A + I has the eigenvectors as A. Its eigenvalues are _ by 1. 


2 Compute the eigenvalues and eigenvectors of A and A~!: 


JO 2 -1 _|—3/4 1/2 
G 4 and A =| 1/2 sf 


A`! has the eigenvectors as A. When A has eigenvalues à; and A2, its inverse 
has eigenvalues 
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Compute the eigenvalues and eigenvectors of A and A’: 
-1 3 J 7 —3 
a=] > | and A ae | 


A’ has the same as A. When A has eigenvalues À; and A2, A? has eigenvalues 


Find the eigenvalues of A and B and A + B: 


1 0 1 1 2 1 
ol 4 and B=| J and a+B=li k 
Eigenvalues of A + B (are equal to)(are not equal to) eigenvalues of A plus eigen- 
values of B. 


Find the eigenvalues of A and B and AB and BA: 


1 O0 1 1 1 1 2. 1 
A f j and B lo J and AB i A and B f J 
Eigenvalues of AB (are equal to)(are not equal to) eigenvalues of A times eigenvalues 
of B. Eigenvalues of AB (are equal to)(are not equal to) eigenvalues of BA. 


Elimination produces A = LU. The eigenvalues of U are on its diagonal; they 
are the . The eigenvalues of L are on its diagonal; they are all . The 
eigenvalues of A are not the same as 

(a) If you know x is an eigenvector, the way to find À is to 


(b) If you know A is an eigenvalue, the way to find x is to 
What do you do to Ax = Ax, in order to prove (a), (b), and (c)? 


(a) 1° is an eigenvalue of A”, as in Problem 3. 
(b) A! is an eigenvalue of A~!, as in Problem 2. 
(c) A+ 11s an eigenvalue of A + J, as in Problem 1. 


Find the eigenvalues and eigenvectors for both of these Markov matrices A and A”. 
Explain why A! is close to A”: 


[6 2 FB: 173 
a=" J ane. <2 a | 


Find the eigenvalues and eigenvectors for the projection matrices P and P oe 


2 4 0 
P=|4 8 O 
0 0 1 


If two eigenvectors share the same A, so do all their linear combinations. Find an 
eigenvector of P with no zero components. 


11 


12 


13 


14 


15 


16 


17 


18 
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From the unit vector u = (Z, 4, Z> >) construct the rank one projection matrix 
P = uu!. 


(a) Show that Pu = u. Then u is an eigenvector with À = 1. 
(b) Ifv is perpendicular to u show that Pv = 0. Then A = 0. 


(c) Find three independent eigenvectors of P all with eigenvalue A = 0. 


Solve det(Q — àI) = 0 by the quadratic formula to reach à = cos 0 + i sin 8: 


p — sin@ 


Sao coi | rotates the xy plane by the angle 0. 


Find the eigenvectors of Q by solving (Q — AI)x = 0. Use i? = —1. 


Every permutation matrix leaves x = (1,1,..., 1) unchanged. Then A = 1. Find 
two more A’s for these permutations: 


0 1 0 0 0 1 
Ps0 0 1 and P=/;0 1 OQO 
1 0 0 1 0 0 
Prove that the determinant of A equals the product A1A2---A,. Start with the poly- 


nomial det(A — AJ) separated into its n factors. Then set À = 
det(A — AI) = (Ay —A)(A2 — à) --- (An — à) so detA= 
The sum of the diagonal entries (the trace) equals the sum of the eigenvalues: 
a b 2 
A= c d has det(A — AI) = à^ — (a + d)à + ad — bc = Q. 
If A has 4; = 3 and à2 = 4 then det(A — AJ) = . The quadratic formula 


gives the eigenvalues à = (a+d +4- )/2andA= . Their sum is 


If A has Ay = 4 and Az = 5 then det(A — AJ) = (A — 4 (À — 5) = à? — 91.4 20. 
Find three matrices that have trace a + d = 9 and determinant 20 and A = 4, 5. 


A 3 by 3 matrix B is known to have eigenvalues 0, 1, 2. This information is enough 
to find three of these: 

(a) the rank of B 

(b) the determinant of BTB 

(c) the eigenvalues of BTB 

(d) the eigenvalues of (B + I jL 


Choose the second row of A = [91] so that A has eigenvalues 4 and 7. 
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19 


20 


21 


22 


23 


24 


25 


26 


27 
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Choose a, b, c, so that det(A — AJ) = 90 — i>. Then the eigenvalues are —3, 0, 3: 
0 1 0 
A=!]0 0 1 
a b c 


The eigenvalues of A equal the eigenvalues of A. This is because det(A — AJ) 
equals det(A! — AJ). That is true because . Show by an example that the 
eigenvectors of A and A! are not the same. 


Construct any 3 by 3 Markov matrix M: positive entries down each column add to 1. 
If e = (1, 1, 1) verify that M'e = e. By Problem 20, à = 1 is also an eigenvalue 
of M. Challenge: A 3 by 3 singular Markov matrix with trace 5 has eigenvalues 
A= 


Find three 2 by 2 matrices that have A; = Az = 0. The trace is zero and the determi- 
nant is zero. The matrix A might not be 0 but check that A? = 0. 


This matrix is singular with rank one. Find three 4’s and three eigenvectors: 
1 2 1 2 
A=|2|[2 1 2]=|4 2 4 
i E cae 


Suppose A and B have the same eigenvalues A;,..., A, with the same independent 
eigenvectors X1,...,X,-. Then A = B. Reason: Any vector x is a combination 
ciXxi +- +CnXn. What is Ax? What is Bx? 


The block B has eigenvalues 1,2 and C has eigenvalues 3, 4 and D has eigenval- 
ues 5, 7. Find the eigenvalues of the 4 by 4 matrix A: 


0 1 3 0 

vee B C| |-2 3 0 4 

10 D| | 00 6 1 

0 0 1 6 

Find the rank and the four eigenvalues of 

1 1 1 1 1 O 1 O 
1 1 1 1 0 1 0 1 
iy e a Ga R ge 
1 1 1 1 0 1 0 1 


Subtract 7 from the previous A. Find the i’s and then the determinant: 
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When A (all ones) is 5 by 5, the eigenvalues of A and B = A — I are and 


28 (Review) Find the eigenvalues of A, B, and C: 
1 2 3 0 0 1 2 2o2 
A=|0 4 5 and B=!;0 2 0 and C=]2 2 2 
0 0 6 3 0 0 2 32: 2 


29 Whena + b = c + d show that (1, 1) is an eigenvector and find both eigenvalues of 


a b 
A= : 
je | 
30 When P exchanges rows 1 and 2 and columns 1 and 2, the eigenvalues don’t change. 
Find eigenvectors of A and PAP for à = 11: 


1 2 4 6 3 3 
A=-|3 6 3| and PAP=|2 1 1 
4 8 4 8 4 4 


31 Suppose A has eigenvalues 0, 3, 5 with independent eigenvectors u, v, w. 


(a) Give a basis for the nullspace and a basis for the column space. 
(b) Find a particular solution to Ax = v + w. Find all solutions. 


(c) Show that Ax = u has no solution. (If it did then would be in the 
column space.) 


6.2 Diagonalizing a Matrix 


When x is an eigenvector, multiplication by A is just multiplication by a single number: 
Ax = Ax. All the difficulties of matrices are swept away. Instead of watching every 
interconnection in the system, we follow the eigenvectors separately. It is like having 
a diagonal matrix, with no off-diagonal interconnections. A diagonal matrix is easy to 
Square and easy to invert, because it breaks into 1 by 1 matrices: 
2 —] a 
Ay at Ay Ay? 
= and = 
a 42 Àn dn 


n 


The point of this section is very direct. The matrix A turns into this diagonal matrix A 
when we use the eigenvectors properly. This is the matrix form of our key idea. We start 
right off with that one essential computation. 
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6D Diagonalization Suppose the n by n matrix A has n linearly independent eigenvec- 
tors. Put them into the columns of an eigenvector matrix S. Then S7! AS is the eigenvalue 
matrix A: 
À1 
STIAS =A= (6.7) 
Àn 


The matrix A is “diagonalized.” We use capital lambda for the eigenvalue matrix, 
because of the small A’s (the eigenvalues) on its diagonal. 


Proof Multiply A times its eigenvectors, which are the columns of S. The first column of 
AS is Axı. That is à; x1: 


AS=A|Xx Xn | = | Aix AnXn 


The trick is to split this matrix AS into S times A: 


À1 
À1xX1 ee ones eal a | Xi =S ANA 
Àn 


Keep those matrices in the right order! Then à; multiplies the first column x1, as shown. 
The diagonalization is complete, and we can write AS = SA in two good ways: 


AS=SA is S~1AS=A or A=SAS™!. (6.8) 


The matrix S has an inverse, because its columns (the eigenvectors of A) were assumed to 
be linearly independent. Without n independent eigenvectors, we absolutely can’t diago- 
nalize. 


The matrices A and A have the same eigenvalues 4;,...,A,. The eigenvectors are 
different. The job of the original eigenvectors was to diagonalize A—those eigenvectors 
of A went into S. The new eigenvectors, for the diagonal matrix A, are just the columns 
of J. By using A, we can solve differential equations or difference equations or even 
Ax =b. 


Example 6.5 The projection matrix P = | 2 2 | has eigenvalues 1 and 0. The eigenvec- 
tors (1, 1) and (—1, 1) go into S. From S comes S-!. Then S7! PS is diagonal: 


5 O75 .5||1 -1 1 0 
=j = — 
stes=| 5 sis sili l-l o]=* 
The original projection satisfied P? = P. The new projection satisfies A? = A. The 
column space has swung around from (1, 1) to (1,0). The nullspace has swung around 
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from (—1, 1) to (0, 1). It is “just” a numerical simplification, but it makes all future com- 
putations easy. 
Here are four small remarks about diagonalization, before the applications. 


Remark 1 Suppose the numbers i, ..., Àn are all different. Then it is automatic that the 
eigenvectors x1, ..., X, are independent. See 6E below. Therefore any matrix that has no 
repeated eigenvalues can be diagonalized. 


Remark 2 The eigenvector matrix S is not unique. We can multiply its columns (the 
eigenvectors) by any nonzero constants. Suppose we multiply by 5 and —1. Divide the 
rows of S~! by 5 and —1 to find the new inverse: 


_ 1 dts Sits 1] f1 o0 
Sich P Swen = | E j - Po Ja Di 


The extreme case is A = I, when every vector is an eigenvector. Any invertible matrix S 
can be the eigenvector matrix. Then ST! S = I (which is A). 


Remark 3 To diagonalize A we must use an eigenvector matrix. From STAS = A we 
know that AS = SA. If the first column of S is y, the first column of AS is Ay. The first 
column of SA is A;y. For those to be equal, y must be an eigenvector. 

The eigenvectors in S$ come in the same order as the eigenvalues in A. To reverse the 
order in S and A, put (—1, 1) before (1, 1): 


rs sds sla l-o i= a 


Remark 4 (repeated warning for repeated eigenvalues) Some matrices have too few eigen- 
vectors. Those matrices are not diagonalizable. Here are two examples: 


Their eigenvalues happen to be 0 and 0. Nothing is special about zero—it is the repeated À 
that counts. Look for eigenvectors of the second matrix. 


waor oom (2 JEJE] 


The eigenvector x is a multiple of (1, 0). There is no second eigenvector, so A cannot be 
diagonalized. This matrix is the best example to test any statement about eigenvectors. In 
many true-false questions, this matrix leads to false. 

Remember that there is no connection between invertibility and diagonalizability: 


- Invertibility is concerned with the eigenvalues (zero or not). 


— Diagonalizability is concerned with the eigenvectors (too few or enough). 
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Each eigenvalue has at least one eigenvector! If (A—AJ)x = Oleads you to x = 0, then A is 
not an eigenvalue. Look for a mistake in solving det(A —AJ) = 0. If you have eigenvectors 
for n different X’s, those eigenvectors are independent and A is diagonalizable. 


6E (Independent x from different à) Eigenvectors x, ..., x; that correspond to dis- 
tinct (all different) eigenvalues are linearly independent. Ann by n matrix with n different 
eigenvalues (no repeated 4’s) must be diagonalizable. 


Proof Suppose cjx; + c2x2 = 0. Multiply by A to find cjA1x1 + c2A2x2 = 0. Multiply 
by A2 to find cjA2x1 + c2A2x2 = 0. Now subtract one from the other: 


Subtraction leaves (A, — à2)c1x1 = 0. 


Since the 4’s are different and x; # 0, we are forced to the conclusion that c} = 0. 
Similarly cp = 0. No other combination gives cix} + c2x2 = 0, so the eigenvectors xj 
and x2 must be independent. 

This proof extends from two eigenvectors to j eigenvectors. Suppose cyxX; +--+: + 
cjx; = 0. Multiply by A, then multiply separately by Àj, and subtract. This removes xj. 
Now multiply by A and by A;_; and subtract. This removes x;_;. Eventually only a 
multiple of x, is left: 


(Ay — à2) -+ (Ay —Aj)e1x; =0 which forces cı = 0. (6.9) 


Similarly every c; = 0. When the A’s are all different, the eigenvectors are independent. 
With j = n different eigenvalues, the full set of eigenvectors goes into the columns 
of S. Then A is diagonalized. 


Example 6.6 The Markov matrix A = [ -? 3 | in the last section had A; = 1 and A2 = .6. 
Here is SAS7!: 


9 3) 7.75 1771 OFF 1 J oe 
ft ale [as illo slas -25]= S85" 


The eigenvectors (.75, .25) and (1, —1) are in the columns of S. We know that they are 
also the eigenvectors of A”. Therefore A? has the same S, and the eigenvalues in A are 


squared: 
A? = SAS~'!SAS~! = SA2S7!, 


Just keep going, and you see why the high powers A* approach a “steady state”: 


75 fi 0 1 1 
A* = SA*‘S"! = ; 
e a f mn E: E 
As k gets larger, (.6)* gets smaller. In the limit it disappears completely. That limit is 
AS 75 1ù/||/1 0 1 1 7 ID. ald 
235 —1)/}0 Of] .25 —.75 2 2a 
The limit has the eigenvector x; in both columns. We saw this steady state in the last 
section. Now we see it more quickly from powers of A = SA S7}. 
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Eigenvalues of AB and A + B 


The first guess about the eigenvalues of AB is not true. An eigenvalue A of A times an 
eigenvalue £ of B usually does not give an eigenvalue of AB. It is very tempting to think 
it Should. Here is a false proof : 


ABx = ABx = BAx = Bix. (6.10) 


It seems that 6 times à is an eigenvalue. When x is an eigenvector for A and B, this 

proof is correct. The mistake is to expect that A and B automatically share the same 

eigenvector x. Usually they don’t. Eigenvectors of A are not generally eigenvectors of B. 
In this example A and B have zero eigenvalues, while 1 is an eigenvalue of AB: 


0 1 0 0 1 0 0 1 
a=|) H and s=|' ii then AB =| J and A+B=| i 1 


For the same reason, the eigenvalues of A + B are generally not à + B. Here à + B = 0 
while A + B has eigenvalues 1 and —1. (At least they add to zero.) 

The false proof suggests what is true. Suppose x really is an eigenvector for both 
A and B. Then we do have ABx = ABx. Sometimes all n eigenvectors are shared, and 
we can multiply eigenvalues. The test for A and B to share eigenvectors is important in 
quantum mechanics—time out to mention this application of linear algebra: 


6F Commuting matrices share eigenvectors Suppose A and B can each be diagonal- 
ized. They share the same eigenvector matrix S if and only if AB = BA. 


The uncertainty principle In quantum mechanics, the position matrix P and the mo- 
mentum matrix Q do not commute. In fact QP — PQ = I (these are infinite matrices). 
Then we cannot have Px = 0 at the same time as Qx = 0 (unless x = 0). If we knew 
the position exactly, we could not also know the momentum exactly. Problem 32 derives 
Heisenberg’s uncertainty principle from the Schwarz inequality. 


Fibonacci Numbers 


We present a famous example, which leads to powers of matrices. The Fibonacci numbers 
start with Fo = 0 and F; = 1. Then every new F is the sum of the two previous F’s: 


The sequence 0, 1, 1, 2,3,5,8,13,... comes from Fx42 = Fk41 + Fk. 


These numbers turn up in a fantastic variety of applications. Plants and trees grow in a 
spiral pattern, and a pear tree has 8 growths for every 3 turns. For a willow those numbers 
can be 13 and 5. The champion is a sunflower of Daniel O’ Connell, whose seeds chose the 
almost unbelievable ratio F12/ F13 = 144/233. Our problem is more basic. 


Problem: Find the Fibonacci number F099 The slow way is to apply the rule Fg42 = 
Fk+1 + Fy one step at a time. By adding Fg = 8 to F7 = 13 we reach Fg = 21. Eventually 
we come to F100. Linear algebra gives a better way. 
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The key is to begin with a matrix equation uy4; = Aux. That is a one-step rule 
for vectors, while Fibonacci gave a two-step rule for scalars. We match them by putting 
Fibonacci numbers into the vectors: 


Fg = Fri + F; 1 

Let ug = ea The rule A A : becomes Ux41 = : |u 
Fx Fk41 = Fri 1 0 

(6.11) 


Every step multiplies by A = [ ! 4 ]. After 100 steps we reach u100 = Auo: 


o-[i) G] E] eE] eE 


The second component of uioo will be Fioo. The Fibonacci numbers are in the powers 
of A—and we can compute A!™ without multiplying 100 matrices. 
This problem of A! is just right for eigenvalues. Subtract A from the diagonal of A: 


f=): - 4 
a-u =] i 4 leadsto det(A —AI) =A* —A-—1. 


The eigenvalues solve à? — 24 — 1 = 0. They come from the quadratic formula (—b + 


vb? — 4ac ) f2a: 


1 5 l-5 
ALS = x 1.618 and À2 = x —.618. 


These eigenvalues À; and à2 lead to eigenvectors x; and x2. This completes step 1: 
l1— ài 1 _ 410 a oe Al 
Spe NO e= 
I= i = N when x= A2 
Eo a | = [0 al Ie (a 
Step 2 finds the combination of those eigenvectors that gives up = (1, 0): 
l l À À2 xı- x 
ere = = 6.12 
[o a1 =A (a e) Co MOTA aN 


The final step multiplies by A! to find u100. The eigenvectors stay separate! They are 
multiplied by (A,)! and (A2)!: 


(A)! x1 — (Az)! x2 
4100 = <. (6.13) 
Ay — 2 
We want F100 = second component of u100. The second components of x; and x2 are 1. 
Substitute the numbers à; and 42 into equation (6.13), to find A; — A2 = /5 and Fio0: 


1 1 + 5 \ 100 1 — /5\ 100 m 
Fo = 73 (5) -( 5 ) ~ 3.54 . 10%. (6.14) 
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Is this a whole number? Yes. The fractions and square roots must disappear, because 
Fibonacci’s rule Fx42 = Fk+1 + Fx stays with integers. The second term in (6.14) is less 
than h, so it must move the first term to the nearest whole number: 
1 /1+V75\" 
kth Fibonacci number = nearest integer to — (=°) ; 
V5\ 2 
The ratio of Fe to Fs is 8/5 = 1.6. The ratio Fj01/ F100 must be very close to (1 + V5) /2. 
The Greeks called this number the “golden mean.” For some reason a rectangle with sides 
1.618 and 1 looks especially graceful. 


(6.15) 


Matrix Powers AX 


Fibonacci’s example is a typical difference equation uz,4; = Aux. Each step multiplies 
by A. The solution is ug = A*ug. We want to make clear how diagonalizing the matrix 
gives a quick way to compute A‘. 

The eigenvector matrix S produces A = SAS~!. This is a factorization of the matrix, 
like A = LU or A = QR. The new factorization is perfectly suited to computing powers, 
because every time S~! multiplies S we get I: 

A? = SAS~'SAS“! = SA?ST! and AF = (SAS!) (SAST) = Sats“, 
(6.16) 
The eigenvector matrix for AF is still S, and the eigenvalue matrix is A. We knew that. 
The eigenvectors don’t change, and the eigenvalues are taken to the kth power. When A is 
diagonalized, A“ ug is easy to compute: 


1 Find the eigenvalues and n independent eigenvectors. 
2 Write uo as a combination c1x1 + -+ - + CnXn of the eigenvectors. 
3 Multiply each eigenvector x; by (A;)*. 


4 Then uz is the combination cy xy +--+ Cid Ei 


In matrix language A* is (SAS~!)* which is S times A* times S~!. In vector language, 


the eigenvectors in S lead to the c’s: 
C1 
uo = c1X1 +--+ CnXn = | x] Xn J This says that uo = Se. 
Cn 
The coefficients c1, . . . , Cn in Step 2 are c = S~!ug. Then Step 3 multiplies by A*. Step 4 
collects together the new combination ` c; MX. It is all in the matrices S and A* and S~!: 
Ak C] 
AŽuo = SA% Slug = SA*c = | xy Xn : |. (6.17) 
Cn 


This result is exactly ug = cy Aix +--+ Cadi Xn It is the solution to ug. = Aug. 
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Example 6.7 Choose a matrix for which S and A and S~! contain whole numbers: 


1 1 1 j| 
pa 4 has A; =1 and n=(9| A. =2 and =i | 


A is triangular, so its eigenvalues 1 and 2 are on the diagonal. A* is also triangular, with 1 
and 2* on the diagonal. Those numbers stay separate in A*. They are combined in A*: 


1 11714 1 —] 1 2-1] 
Ak = ko-l = = 


With k = 1 we get A. With k = 0 we get I. With k = —1 we get Aq}. 


Note The zeroth power of every nonsingular matrix is A? = I. The product SA°S7! 
becomes SZ S7! which is J. Every A to the zeroth power is 1. But the rule breaks down 
when A = 0. Then 0° is not determined. We don’t know A? when A is singular. 


Nondiagonalizable Matrices (Optional) 


Suppose à = 0 is an eigenvalue of A. We discover that fact in two ways: 
1 Eigenvectors (geometric) There are nonzero solutions to Ax = Ox. 
2 Eigenvalues (algebraic) A has a zero determinant and det(A — O7) = 0. 


The number à = 0 may be a simple eigenvalue or a multiple eigenvalue, and we want to 
know its multiplicity. Most eigenvalues have multiplicity M = 1 (simple eigenvalues). 
Then there is a single line of eigenvectors, and det(A — AJ) does not have a double factor. 
For exceptional matrices, an eigenvalue (for example à = 0) can be repeated. Then there 
are two different ways to determine its multiplicity: 


1 (Geometric multiplicity) Count the independent eigenvectors for A = 0. This is the 
dimension of the nullspace of A. As always, the answer isn — r. 


2 (Algebraic multiplicity) Count how often A = 0 is a solution of det(A — AJ) = 0. 
If the algebraic multiplicity is M, then A™ is a factor of det(A — AJ). 


The following matrix A is the standard example of trouble. Its eigenvalue à = 0 is 
repeated. It is a double eigenvalue (M = 2) with only one eigenvector (n — r = 1). The 
geometric multiplicity can be below the algebraic multiplicity—it is never larger: 


—À 1 


0 -A 


A=|) 4 has rank r=1 but det(A — A1) = 


=). 

There “should” be two eigenvectors, because the equation à2 = 0 has a double root. The 
double factor À? makes M = 2. But there is only one eigenvector x = (1,0). This 
shortage of eigenvectors means that A is not diagonalizable. 
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We have to emphasize: There is nothing special about à = 0. It makes for easy 
computations, but these three matrices also have the same shortage of eigenvectors. Their 
repeated eigenvalue is A = 5: 


5 1 6 —l1 7 2 
a=|) 4 and aes rf] and a=| 5 I 
Those all have det(A — ÀI) = (A — 5)2, The algebraic multiplicity is M = 2. But A — 5I 


has rank r = 1. The geometric multiplicity is n — r = 1. There is only one eigenvector, 
and these matrices are not diagonalizable. 


Problem Set 6.2 


Questions 1-8 are about the eigenvalue and eigenvector matrices. 


1 Factor these two matrices into A = SAS7!: 


l 2 1 1 
A=| 4 | and a=|) = 


2 IfA=SAS7'then A? =( X X )andAt=( X X). 


3 IfA has Ay = 2 with eigenvector x; = [į ] and Az = 5 with x2 = [1], use SAST! 
to find A. No other matrix has the same A’s and x’s. 


4 Suppose A = SAS~!. What is the eigenvalue matrix for A +27? What is the 
eigenvector matrix? Check that A+21=( X X jak 


5 True or false: If the columns of S (eigenvectors of A) are linearly independent, then 


(a) A is invertible (b) A is diagonalizable 
(c) S is invertible (d) S is diagonalizable. 


6 If the eigenvectors of A are the columns of Z, then A is a matrix. If the eigen- 
vector matrix S is triangular, then S7! is triangular. Prove that A is also triangular. 


7 Describe all matrices S that diagonalize this matrix A: 


4 0 
A= 
[i | 
Then describe all matrices that diagonalize A~!. 
8 Write down the most general matrix that has eigenvectors [ } ] and [_| ]. 


Questions 9-14 are about Fibonacci and Gibonacci numbers. 


9 For the Fibonacci matrix A = [} 4 ], compute A? and A? and A*. Then use the text 
and a calculator to find F. 


230 


10 


11 


12 


13 


14 
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Suppose each number G42 is the average of the two previous numbers Gx+1 
and Gg. Then Gz+2 = $ (Gk+1 + Gx): 


Gk+2 = 5 Ges + ¿Gk 7 [g] | lea 
Gear = Gea Gk+1 Gk | 


(a) Find the eigenvalues and eigenvectors of A. 
(b) Find the limit as n — oo of the matrices A” = SA” STL., 


(c) If Go = 0 and G; = 1 show that the Gibonacci numbers approach 5. 


Diagonalize the Fibonacci matrix by completing S~!: 


1 1| Jà Agtjar 0 
1 0} |1 14, 0 %2 
Do the multiplication SA¥*S7![} ] to find its second component Fy = (àf — 5) / 
(Ay — A2). 
The numbers ar and A% satisfy the Fibonacci rule Fy42 = Pei, + Fk: 
ASR a and DI ae 


Prove this by using the original equation for the 4’s. Then any combination of a 
and A5 satisfies the rule. The combination Fy = (Af — AS) /(A1 — 2) gives the right 
start Fo = Oand F; = 1. 


Suppose Fibonacci had started with Fo = 2 and F; = 1. The rule Fy42 = Fk+1 + Fk 
is the same so the matrix A is the same. Its eigenvectors add to 


or ites 4 ‘a "| PHR 


After 20 steps the second component of A(x, +x2)is( )?9 + ase Compute 
that number Fp. 


Prove that every third Fibonacci number in 0, 1, 1, 2,3, ... is even. 


Questions 15-18 are about diagonalizability. 


15 


16 


True or false: If the eigenvalues of A are 2, 2, 5 then the matrix is certainly 
(a) invertible (b) diagonalizable (c) not diagonalizable. 
True or false: If the only eigenvectors of A are multiples of (1, 4) then A has 


(a) noinverse (b) arepeatedeigenvalue (c) nodiagonalization SAS@!. 


17 


18 
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Complete these matrices so that det A = 25. Then check that à = 5 is repeated— 
the determinant of A — AJ is (A — 5)*. Find an eigenvector with Ax = 5x. These 
matrices will not be diagonalizable because there is no second line of eigenvectors. 


8 9 4 10 5 
a=] 7 and a=] 1 and ae | 


The matrix A = [3 4] is not diagonalizable because the rank of A — 37 is ____ 
Change one entry by .01 to make A diagonalizable. Which entries could you change? 


Questions 19-23 are about powers of matrices. 


19 


20 


21 


22 


23 


A* = SA*S—! approaches the zero matrix as k —> œ if and only if every A has 
absolute value less than . Which of these matrices has A% — 0? 


6 A 6 9 
ma J ang s= Al 


(Recommended) Find A and S to diagonalize A in Problem 19. What is the limit 
of A* as k + œ? What is the limit of SA*S~!? In the columns of this limiting 
matrix you see the 


Find A and S to diagonalize B in Problem 19. What is B!°uo for these ug? 


3 3 6 
u=] and w= and w= o|; 


Diagonalize A and compute SA*S7! to prove this formula for A“: 


o [2 1 p 1f1+3* 1-3 
a=; 4 aes arene 143 | 


Diagonalize B and compute SA‘ S7! to prove this formula for B*: 


3 1 p [34 3k- 2" 
B=|) 4 has B =" ak . 


Questions 24-29 are new applications of A = SAS}. 


24 


25 


Suppose that A = SAST}. Take determinants to prove that det A = AjA2-+-An = 
product of A’s. This quick proof only works when 


Show that trace AB = trace BA, by directly adding the diagonal entries of AB 


and BA: 
A= f A and B = i at 
c d S t 


Choose A as S and B as AS~!. Then SAST! has the same trace as AS~!S. The 
trace of A equals the trace of ____ whichis _ 
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26 


27 


28 


29 


30 


31 


32 


33 


34 


35 
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AB — BA = I is impossible since the left side has trace = . But find an 
elimination matrix so that A = E and B = E! give 


i 


AB-BA=| 
[o 


i which has trace zero. 


If A = SAS™"', diagonalize the block matrix B = [4,9]. Find its eigenvalue and 
eigenvector matrices. 


Consider all 4 by 4 matrices A that are diagonalized by the same fixed eigenvector 
matrix S. Show that the A’s form a subspace (cA and A; + A2 have this same $S). 
What is this subspace when S = J? What is its dimension? 


Suppose A? = A. On the left side A multiplies each column of A. Which of our four 
subspaces contains eigenvectors with à = 1? Which subspace contains eigenvectors 
with A = 0? From the dimensions of those subspaces, A has a full set of independent 
eigenvectors and can be diagonalized. 


(Recommended) Suppose Ax = Ax. If A = 0 then x is in the nullspace. If A Æ 0 
then x is in the column space. Those spaces have dimensions (n — r) +r =n. So 
why doesn’t every square matrix have n linearly independent eigenvectors? 


The eigenvalues of A are 1 and 9, the eigenvalues of B are —1 and 9: 


5 4 4 5 
=|; i and B=| 5 af 


Find a matrix square root of A from R = S VAS7), Why is there no real matrix 
square root of B? 


(Heisenberg’s Uncertainty Principle) AB — BA = I can happen for infinite matrices 
with A = A! and B = —B!. Then 


xx =x! ABx — xT BAx <2\/Ax|| ||Bx|l. 


Explain that last step by using the Schwarz inequality. Then the inequality says that 
|| Ax |{/||x]] times ||Bx||/||x|| is at least 5. It is impossible to get the position error 
and momentum error both very small. 


If A and B have the same A’s with the same independent eigenvectors, their factor- 
izations into are the same. So A = B. 


Suppose the same S diagonalizes both A and B, so that A = SA,S~! and B = 
SA2S~!. Prove that AB = BA. 


If A = SAS~! show why the product (A — A1/)(A — AzI)--- (A — AJ) is the 
zero matrix. The Cayley-Hamilton Theorem says that this product is zero for every 
matrix. 
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36 The matrix A = [734] has det(A — AI) = à? — 1. Show from Problem 35 that 
A? — I = 0. Deduce that A~! = A and check that this is correct. 


37 (a) When do the eigenvectors for A = 0 span the nullspace N(A)? 
(b) When do all the eigenvectors for à 4 0 span the column space R(A)? 


6.3 Symmetric Matrices 


The eigenvalues of projection matrices are 1 and 0. The eigenvalues of reflection matrices 
are 1 and —1. Now we open up to all other symmetric matrices. It is no exaggeration to say 
that these are the most important matrices the world will ever see—in the theory of linear 
algebra and also in the applications. We will come immediately to the two basic questions 
in this subject. Not only the questions, but also the answers. 

You can guess the questions. The first is about the eigenvalues. The second is about 
the eigenvectors. What is special about Ax = Ax when A is symmetric? In matrix 
language, we are looking for special properties of A and S when A = AT. 

The diagonalization A = SAST! should reflect the fact that A is symmetric. We 
get some hint by transposing SAST! to (S~!)F AS. Since A = A! those are the same. 
Possibly S~! in the first form equals ST in the second form. In that case S'S = I. We are 
near the answers and here they are: 


1 A symmetric matrix has real eigenvalues. 
2 The eigenvectors can be chosen orthonormal. 


Those orthonormal eigenvectors go into the columns of S. There are n of them (inde- 
pendent because they are orthonormal). Every symmetric matrix can be diagonalized. 
Its eigenvector matrix S becomes an orthogonal matrix Q. Orthogonal matrices have 
Q7! = QT—what we suspected about S is true. To remember it we write Q in place of S, 
when we choose orthonormal eigenvectors. 

Why do we use the word “choose”? Because the eigenvectors do not have to be unit 
vectors. Their lengths are at our disposal. We will choose unit vectors—eigenvectors of 
length one, which are orthonormal and not just orthogonal. Then A = SAS™! is in its 
special and particular form for symmetric matrices: 


6H (Spectral Theorem) Every symmetric matrix A = A! has the factorization QA QT 
with real diagonal A and orthogonal matrix Q: 


A=QAQ'=QAQ' with Qg'=@Q'. 


It is easy to see that QA Q! is symmetric. Take its transpose. You get (Q')TATO! which 
is QAQ! again. So every matrix of this form is symmetric. The harder part is to prove that 
every Symmetric matrix has real i’s and orthonormal x’s. This is the “spectral theorem” 
in mathematics and the “principal axis theorem” in geometry and physics. We approach it 
in three steps: 


234 6 Eigenvalues and Eigenvectors 


1 By an example (which proves nothing, except that the spectral theorem might be 
true) 


2 By calculating the 2 by 2 case (which convinces most fair-minded people) 
3 By a proof when no eigenvalues are repeated (leaving only real diehards). 


The diehards are worried about repeated eigenvalues. Are there still n orthonormal eigen- 
vectors? Yes, there are. They go into the columns of S (which becomes Q). The last page 
before the problems outlines this fourth and final step. 

We now take steps 1 and 2. In a sense they are optional. The 2 by 2 case is mostly 
for fun, since it is included in the final n by n case. 


Example 6.8 Find the A’s and x’s when A = F i and A — ÀI = i 5 A 4 i 
Solution The equation det(A — AJ) = 0 is å? — 5A = 0. The eigenvalues are 0 and 5 
(both real). We can see them directly: à = 0 is an eigenvalue because A is singular, and 
à = 5 is the other eigenvalue so that 0 + 5 agrees with 1 + 4. This is the trace down the 
diagonal of A. 

The eigenvectors are (2, —1) and (1, 2)—orthogonal but not yet orthonormal. The 
eigenvector for À = 0 is in the nullspace of A. The eigenvector for à = 5 is in the column 
space. We ask ourselves, why are the nullspace and column space perpendicular? The 
Fundamental Theorem says that the nullspace is perpendicular to the row space—not the 
column space. But our matrix is symmetric! Its row and column spaces are the same. 
So the eigenvectors (2, —1) and (1, 2) are perpendicular—which their dot product tells us 
anyway. 

These eigenvectors have length v5. Divide them by (5 to get unit vectors. Put the 
unit vectors into the columns of S (which is Q). Then A is diagonalized: 


Q'AQ= 


2 -l1 2 | 
LO ZITTE Olek 2 0 0 A 
gs 24l A L05] >` 
Now comes the calculation for any 2 by 2 symmetric matrix. First, real eigenvalues. 
Second, perpendicular eigenvectors. The 4’s come from 
a— À b 
det 
Š | b c 


al =)? — (a +c)à + (ac — b°) = 0. (6.18) 


This quadratic factors into (A — à1)(à — à2). The product A;A2 is the determinant D = 
ac — b?. The sum A, + àv is the trace T = a + c. The quadratic formula could produce 
both à’s—but we only want to know they are real. 

The test for real roots of Ax? + Bx + C = 0 is based on B? — 4AC. This must not 
be negative, or its square root in the quadratic formula would be imaginary. Our equation 
has different letters, A? — TA + D = 0, so the test is based on T? — 4D: 
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Real eigenvalues: T? — 4D = (a +.c)* — 4(ac — b?) must not be negative. Rewrite that 
as a? + 2ac+c* —4ac + 4b. Rewrite again as (a — c)? + 4b”. This is not negative! 
So the roots of the quadratic (the eigenvalues) are certainly real. 


Perpendicular eigenvectors: Compute x; and x2 and their dot product: 


_j,a-Ay b _ _ b 
(A ude =| p s a a= SO aeey 


a = from 
(A —AgI)x2 = i r2 | [x] =0 so x = ie “| second 
b c—A2 row 


(6.19) 


first 


| from 
row 


When b is zero, A has perpendicular eigenvectors (1, 0) and (0, 1). Otherwise, take 
the dot product of x; and x2 to prove they are perpendicular: 


Xi -X2 = b(à2 — c) + (Ay — a)b = D(A, + à2 —a — c) = 0. (6.20) 
This is zero because A; + A2 equals the trace a + c. Thus x; - x2 = 0. 


Now comes the general n by n case, with real X’s and perpendicular eigenvectors. 
61 Real Eigenvalues The eigenvalues of a real symmetric matrix are real. 


Proof Suppose that Ax = Ax. Until we know otherwise, A might be a complex number. 
Then A has the form a + ib (a and b real). Its complex conjugate is À = a — ib. Similarly 
the components of x may be complex numbers, and switching the signs of their imaginary 
parts gives ¥. The good thing is that A times x is the conjugate of A times x. So take 
conjugates of Ax = Ax, remembering that A = A = A! is real and symmetric: 


Ax =Ax leadsto Ax =Ax. Transpose to XTA =X". (6.21) 
Now take the dot product of the first equation with x and the last equation with x: 
x'Ax =x!lax andalso © !Ax = X! Ax. (6.22) 


The left sides are the same so the right sides are equal. One equation has À, the other 
has A. They multiply x! x which is not zero—it is the squared length of the eigenvector. 
Therefore à must equal i, and a + ib equals a — ib. The imaginary part is b = 0 and the 
number à = a is real. Q.E.D. 


The eigenvectors come from solving the real equation (A — AJ)x = 0. So the x’s 
are also real. The important fact is that they are perpendicular. 


6) Orthogonal Eigenvectors Eigenvectors of a real symmetric matrix (when they cor- 
respond to different A’s) are always perpendicular. 


When A has real eigenvalues and real orthogonal eigenvectors, then A = A!. 
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Proof Suppose Ax = A,x and Ay = à2y and A = A!. Take dot products with y and x: 
(Ayx)ly = (Ax)! y = xiAly = xTAy = x doy. (6.23) 


The left side is x'A,y, the right side is xTA2y. Since Ay Æ Az, this proves that x! y = 0. 
Eigenvectors are perpendicular. 


Example 6.9 Find the A’s and x’s for this symmetric matrix with trace zero: 


| = 2-25 


4 3 4 3—iX 


A= E 4 has det(A — ÀI) = a i 
The roots of à? —25 = O are à; = 5 and àv = —5 (both real). The eigenvectors x; = (1, 2) 
and x2 = (—2, 1) are perpendicular. They are not unit vectors, but they can be made into 


unit vectors. Divide by their lengths /5. The new x; and x2 are the columns of Q, and 


Q~! equals QT: 
L 2 1 2 
2 1 F 4 E l 


A =QAQ! = 
V5 LO =5] V5 
This illustrates the rule to remember. Every 2 by 2 symmetric matrix looks like 
T 
z À x 
A=QAQ l xi X l ae (6.24) 
r2 x7 


One more step. The columns x, and x times the rows Axi and hoxg produce A: 
A= Axx + àzxxi. (6.25) 


This is the great factorization Q A QT, written in terms of à’s and x’s. When the symmetric 
matrix is n by n, there are n columns in Q multiplying n rows in QT. The n pieces are 
Axx} . Those are matrices! 


fs 4). 2] a 2/3) 4/5 —2/5 
a=| 4 A A E o (0:20) 


On the nght, each xix} is a projection matrix. It is like uu! in Chapter 4. The spectral 
theorem for symmetric matrices says that A is a combination of projection matrices: 


A = ài Pi +--+ àÀnPhn ài = eigenvalue, P; = projection onto eigenvector. 


Complex Eigenvalues of Real Matrices 


Equation (6.21) went from Ax = Ax to Ax = AX. In the end, à and x were real. Those two 
equations were the same. But a nonsymmetric matrix can easily produce A and x that are 
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complex. In this case, AX = Ax is different from Ax = Ax. It gives us a new eigenvalue 
(which is A) and a new eigenvector (which is x): 


For real matrices, complex d’s and x’s come in “conjugate pairs.” 


If Ax =x then Ax =)dx. 


Example 6.10 The rotation bees 2a | has A; = cos 8 +i sin and A2 = cos 6 —i sin9. 


Those eigenvalues are conjugate to each other. They are A and À, because the imaginary 
part sin@ switches sign. The eigenvectors must be x and x (all this is true because the 


matrix is real): 
oost S } = (cos 0 + i sin) l 
sin@ cos@ | |—i —i 


Cosg zony , = (cos — i sin 0) l . 
sin cos 0 i i 


One is Ax = Ax, the other is A¥ = Ax. One eigenvector has —i, the other has +i. For 
this real matrix the eigenvalues and eigenvectors are not real. But they are conjugates. 


(6.27) 


By Euler’s formula, cos 0 + i sin is the same as e!? Its absolute value is |A| = 1, 
because cos? 0 +sin? 6 = 1. This fact |A| = 1 holds for the eigenvalues of every orthogonal 
matrix—including this rotation. 

We apologize that a touch of complex numbers slipped in. They are unavoidable 
even when the matrix is real. 


6.4 Positive Definite Matrices 


This section concentrates on symmetric matrices that have positive eigenvalues. If sym- 
metry makes a matrix important, this extra property (all A > 0) makes it special. When we 
Say special, we don’t mean rare. Symmetric matrices with positive eigenvalues enter all 
kinds of applications of linear algebra. 

The first problem is to recognize these matrices. You may say, just find the eigen- 
values and test A > 0. That is exactly what we want to avoid. Calculating eigenvalues is 
work. When the ’s are needed, we can compute them. But if we just want to know that 
they are positive, there are faster ways. Here are the two goals of this section: 


1 To find quick tests that guarantee positive eigenvalues. 
2 To explain the applications. 


The matrices are symmetric to start with, so the 4’s are automatically real numbers. 
An important case is 2 by 2. When does A = [4°] have dy > Oand dz > 0? 


6L The eigenvalues of A = AT are positive if and only if a > 0 and ac — b? > 0. 
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This test is passed by [$3] and failed by [45]. Also failed by [~} _9 ] even though the 
determinant is 1. 


Proof without computing the A’s Suppose A; > 0 and Az > 0. Their product A,A2 
equals the determinant ac — b*. That must be positive. Therefore ac is also positive. Then 
a and c have the same sign. That sign has to be positive, because 4; + A2 equals the trace 
a + c. Proved so far: Positive ’s require ac — b? > 0 anda > 0 (and also c > 0). 

The statement was “if and only if,” so there is another half to prove. Start witha > 0 
and ac — b? > 0. This also ensures c > 0. Since 4A equals the determinant ac — b?, the 
d’s are both positive or both negative. Since A; + Az equals the trace a + c > 0, the A’s 
must be positive. End of proof. 


We think of a and ac — b? as a 1 by 1 and a 2 by 2 determinant. Here is another form 
of the test. Instead of determinants, it checks for positive pivots. 


6M_ The eigenvalues of A = A" are positive if and only if the pivots are positive: 


ac — b? 


a 


> 0. 


a>0O and 


A new proof is unnecessary. The ratio of positive numbers is certainly positive: 


— p2 
z > 0. 


a>0 and ac—b*>0 if and only if a>0 and 
The point is to recognize that last ratio as the second pivot of A: 


The first pivot is a The second pivot is 
a b a b 2 b? 
b c 0 c—è?b gi ci 


ee 


The multiplier is b/a a a 


This doesn’t add information, to change from a and ac — b? to their ratio (the pivot). But 
it connects two big parts of linear algebra. Positive eigenvalues mean positive pivots and 
vice versa (for symmetric matrices!). If that holds for n by n symmetric matrices, and it 
does, then we have a quick test for A > 0. The pivots are a lot faster to compute than the 
eigenvalues. It is very satisfying to see pivots and determinants and eigenvalues and even 
least squares come together in this course. 


Example 6.11 This matrix has a = 1 (positive). But ac — b? = (1)(3) — (2)? is negative: 


1 2 aaa: fk tal 
2 3 has a negative eigenvalue and a negative pivot. 


The pivots are 1 and —1. The eigenvalues also multiply to give —1. One eigenvalue is 
negative (we don’t want its formula, just its sign). 


Next comes a totally different way to look at symmetric matrices with positive eigen- 
values. From Ax = Ax, multiply by x! to get x! Ax = Ax! x. The right side is a positive A 
times a positive xTx = ||xl|?. So the left side x! Ax is positive when x is an eigenvector. 
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The new idea is that this number xT Ax is positive for all vectors x, not just the eigenvec- 
tors. (Of course xT Ax = 0 for the trivial vector x = 0.) 

There is a name for matrices with this property x! Ax > 0. They are called positive 
definite. We will prove that these are exactly the matrices whose eigenvalues and pivots 
are all positive. 


Definition The matrix A is positive definite if x' Ax > 0 for every nonzero vector: 


x Ax = [x y | F A H = ax? +2bxy + cy? > 0. 


Multiplying 1 by 2 times 2 by 2 times 2 by 1 produces a single number. It is x! Ax. 
The four entries a, b, b, c give the four parts of that number. From a and c on the diagonal 
come the pure squares ax? and cy”. From b and b off the diagonal come the cross terms 
bxy and byx (the same). Adding those four parts gives xT Ax = ax? + 2bxy + cy’. 

We could have written x; and x2 for the components of x. They will be used often, 
so we avoid subscripts. The number x! Ax is a quadratic function of x and y: 


f@,y= ax? + 2bxy + cy? is “second degree.” 


The rest of this book has been linear (mostly Ax). Now the degree has gone from 1 
to 2. Where the first derivatives of Ax are constant, it is the second derivatives of ax? + 
2bxy + cy? that are constant. Those second derivatives are 2a, 2b, 2b, 2c. They go into 
the matrix 2A! 


a 2 2 

of = 2ax + 2by a i 

Ox dx2 dydx 2a 2b 

9 and 2 2 T 12b 2c]° 
of = 2bx + 2cy L 

dy Oxdy dy? 


This is the 2 by 2 version of what everybody knows for 1 by 1. There the function is ax”, its 
Slope is 2ax, and its second derivative is 2a. Now the function is x! Ax, its first derivatives 
are in the vector 2Ax, and its second derivatives are in the matrix 2A. Third derivatives are 
zero. 

Where does calculus use second derivatives? They give the bending of the graph. 
When f” is positive, the curve bends up from the tangent line. The parabola y = ax? is 
convex up or concave down according toa > 0 ora < 0. The point x = 0 is a minimum 
point of y = x? and a maximum point of y = —x*. To decide minimum versus maximum 
for a two-variable function f(x, y), we need to look at a matrix. 

A is positive definite when f = xT Ax has a minimum at x = y = 0. At other 
points f is positive, at the origin f is zero. The statement “A is a positive definite matrix” 
is the 2 by 2 version of “a is a positive number.” 


Example 6.12 This matrix is positive definite. The function f(x, y) is positive: 


i; 22 ; 
TP J has pivots 1 and 3. 
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The function is x' Ax = x? + 4xy + 7y?. It is positive because it is a sum of squares: 
x? + 4xy +77? = ee Dy)? 4357. 


The pivots 1 and 3 multiply those squares. This is no accident! We prove below, by 
the algebra of “completing the square,” that this always happens. So when the pivots are 
positive, the sum f(x, y) is guaranteed to be positive. 


Comparing Examples 6.11 and 6.12, the only difference is that a22 changed from 3 
to 7. The borderline is when a22 = 4. Above 4, the matrix is positive definite. At a22 = 4, 
the borderline matrix is only “semidefinite.” Then (> 0) changes to (> 0): 


1 2 
h , 
| 2 A as pivots 1 and 


It has eigenvalues 5 and 0. It has a > 0 but ac — b? = 0. 
We will summarize this section so far. We have four ways to recognize a positive 
definite matrix: 


6N When a 2 by 2 symmetric matrix has one of these four properties, it has them all: 
1 The eigenvalues are positive. 

2 The 1 by 1 and 2 by 2 determinants are positive: a > 0 and ac — b? > 0. 

3 The pivots are positive: a > 0 and (ac — b?) /a > 0. 

4 xTAx = ax? + 2bxy + cy? is positive except at (0, 0). 


When A has any one (therefore all) of these four properties, it is a positive definite matrix. 


Note We deal only with symmetric matrices. The cross derivative 3?f/ðxðy always 
equals 3?f/ðyðx. For f(x, y, z) the nine second derivatives fill a symmetric 3 by 3 ma- 
trix. It is positive definite when the three pivots (and the three eigenvalues, and the three 
determinants) are positive. 


Example 6.13 Is f(x, y) = x? + 8xy + 3y everywhere positive—except at (0, 0)? 


Solution The second derivatives are f,, = 2 and fry = fyx = 8 and fy, = 6—all 
positive. But the test is not positive derivatives. We look for positive definiteness. The 
answer is no, this function is not always positive. By trial and error we locate a point 
x =1, y=—I1 where f(1,—1) = 1—8+3 = —4. Better to do linear algebra, and apply 
the exact tests to the matrix that produced f(x, y): 


l 41} x 
x? + 8xy +3y? =[x zip 4 H (6.28) 


The matrix has ac — b? = 3 — 16. The pivots are 1 and —13. The eigenvalues are 
(we don’t need them). The matrix is not positive definite. 
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Note how 8xy comes from a12 = 4 above the diagonal and a2; = 4 symmetrically 
below. Please do that matrix multiplication on the nght side of (6.28) to see the function 
appear. 


Main point The sign of b is not the essential thing. The cross derivative a*f/axdy can 
be positive or negative—it is b? that enters the tests. The size of b, compared to a and c, 
decides whether the matrix is positive definite and the function has a minimum. 


Example 6.14 For which numbers c is x? + 8xy + cy? always positive (or zero)? 


Solution The matrix is A = [ i T: Again a = 1 passes the first test. The second test has 
ac — b? = c — 16. For a positive definite matrix we need c > 16. 


At the “semidefinite” borderline, [ } 4] has A = 17 and 0, determinants 1 and 0, 
pivots 1 and ___. The function x? + 8xy + 16y? is (x + 4y)?. Its graph does not go 
below zero, but it stays equal to zero along the line x + 4y = 0. This is close to positive 
definite, but each test just misses. 

Instead of two squares (see the next example), (x + 4y)* has only one square. The 
function can’t be negative, but it is zero when x = 4 and y = —1: A minimum but not a 
strict minimum. 


Example 6.15 When A is positive definite, write f(x, y) as a sum of two squares. 


Solution This is called “completing the square.” The part ax* + 2bxy is completed to the 
first square a(x + by)? Multiplying that out, ax? and 2bxy are correct—but we have to 


add in a(2 y). To stay even, this added amount b?y? /a has to be subtracted off again: 
b \? — b? 
ax? + 2bxy + cy? =a (« + zy) + (=>) y2. 
a 


After that gentle touch of algebra, the situation is clearer. There are two perfect 
squares (never negative). They are multiplied by two numbers, which could be positive or 
negative. Those numbers a and (ac — b*)/a are the pivots! So positive pivots give a sum 
of squares and a positive definite matrix. Think back to the factorization A = LU and also 
A=LDL!: 


a b 1 Olle b 7 
f a= | bye ii e. (this is LU) 


| 1 Ojja 1 b/a be T 
= P Ail TF i j | (thisis LDL’). (6.29) 


To complete the square, we dealt with a and b and fixed the rest later. Elimination does 
exactly the same. It deals with the first column, and fixes the rest later. We can work with 
the function f(x, y) or the matrix. The numbers that come out are identical. 

Outside the squares are the pivots. Inside (x + b y) are the numbers 1 and p —which 
are in L. Every positive definite symmetric matrix factors into A = LDL" with positive 
pivots. 
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Important to compare A = LDL! with A = QAQ'. One is based on pivots (in D), 
the other is based on eigenvalues (in A). Please do not think that the pivots equal the 
eigenvalues. Their signs are the same, but the numbers are entirely different. 


Positive Definite Matrices: n by n 


For a 2 by 2 matrix, the “positive definite test” uses eigenvalues or determinants or pivots. 
All those numbers must be positive. We hope and expect that the same tests carry over to 
larger matrices. They do. 


60 When ann by n symmetric matrix has one of these four properties, it has them all: 
1 All the eigenvalues are positive. 

2 All the upper left determinants are positive. 

3 All the pivots are positive. 

4  x!Ax is positive except at x = 0. The matrix is positive definite. 


The upper left determinants are 1 by 1, 2 by 2, ..., n by n. The last one is the determinant 
of A. This remarkable theorem ties together the whole linear algebra course—at least for 
symmetric matrices. We believe that two examples are more helpful than a proof. 


Example 6.16 Test the matrices A and A* for positive definiteness: 


2 -1i 0 2-1 b 
A=|-—1 2 1 and A*=|—-1 2 -1 
0 =i. 2 b. = 2 


Solution A is an old friend (or enemy). Its pivots are 2 and 3 and ¢, all positive. Its upper 
left determinants are 2 and 3 and 4, all positive. After some calculation, its eigenvalues are 
2 —./2 and 2 and 2+ /2, all positive. We can write xT Ax as a sum of three Squares (since 
n = 3). Using A = LDL! the pivots appear outside the squares and the multipliers are 
inside: 


x'Ax = 2(x7 — x1x2 + £5 — x9x3 + x3) 
2 2 
= 2(x1 — bra)? + 302 — Bas)? + $x. 
Go to the second matrix A*. The determinant test is easiest. The 1 by 1 determinant 
is 2 and the 2 by 2 determinant is 3. The 3 by 3 determinant comes from A” itself: 


det A* = 4+2b—2b* must be positive. 


At b = —1 and b = 2 we get det A* = O. In those cases A* is positive semidefinite (no 
inverse, zero eigenvalue, xl A*x > 0). Between b = —1 and b = 2 the matrix is positive 
definite. The corner entry b = 0 in the first example was safely between. 
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Figure 6.3 The tilted ellipse 5x? + 8xy + 5y? = 1. Lined up it is 9X? + Y? = 1. 


The Ellipse ax? + 2bxy + cy? =1 


Think of a tilted ellipse centered at (0, 0), as in Figure 6.3a. Turn it to line up with the 
coordinate axes. That is Figure 6.3b. These two pictures show the geometry behind A = 


QAQ™: 
1 The tilted ellipse is associated with A. Its equation is x! Ax = 1. 
2 The lined-up ellipse is associated with A. Its equation is XTAX = 1. 


3 The rotation that lines up the ellipse is Q. 


Example 6.17 Find the axes of the tilted ellipse 5x? + 8xy +5y? = 1. 


Solution Start with the positive definite matrix that matches this function: 


TRR 5 4||x 7 5 4 
The function is [x vl J The matrix is a=; al 


The eigenvalues of A are 4; = 9 and à2 = 1. The eigenvectors are | ! | and [i ]. To make 
them unit vectors, divide by /2. Then A = QAQ! is 


ete aa 


Now multiply both sides by [x y] on the left and [>] on the nght. The matrix i i] 
yields x + y and x — y. The whole multiplication gives 


2 ee: 
5x? + Bxy +5y? = 9 (77) (52) (6.30) 
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The function is again a sum of two squares. But this is different from completing the 
square. The coefficients are not the pivots 5 and 9/5 from D, they are the eigenvalues 9 
and 1 from A. Inside these squares are the eigenvectors (1, 1) and (1, —1). Previously L 
was inside the squares and the pivots were outside. Now Q is inside and the 4’s are outside. 

The axes of the tilted ellipse point along the eigenvectors. This explains why A = 
QAQ" is called the “principal axis theorem”—it displays the axes. Not only the axis 
directions (from the eigenvectors) but also the axis lengths (from the eigenvalues). To see 
it all, use capital letters for the new coordinates that line up the ellipse: 

x+y X= ¥ 
Ts X and Fi 

The ellipse becomes 9X? + Y? = 1. The largest value of X? is 5. The point at the end of 
the shorter axis has X = 1 and Y = 0. Notice: The bigger eigenvalue A; gives the shorter 
axis, of half-length 1/./A; = L, The point at the end of the major axis has X = 0 and 
Y = 1. The smaller eigenvalue A = 1 gives the greater length 1/./A2 = 1. Those are 
really half-lengths because we start from (0, 0). 

In the xy system, the axes are along the eigenvectors of A. In the XY system, the 
axes are along the eigenvectors of A—the coordinate axes. Everything comes from the 
diagonalization A = QA QT. 


6P Suppose A = QAQ" is positive definite. Then xTAx = 1 yields an ellipse: 


=A 


[x y]QaAQ' A =[xX rJa]? = MX? +A2Y7 = 1. 


The half-lengths of the axes are 1/./A; (when X = 1 and Y = 0) and 1/,/A2. 


Note that A must be positive definite. If an eigenvalue is negative (exchange the 4’s 
with the 5’s in A), we don’t have an ellipse. The sum of squares becomes a difference of 
squares: 9X? — Y? = 1. This is a hyperbola. 


Problem Set 6.4 


Problems 1-13 are about tests for positive definiteness. 


1 Which of Aj, A2, A3, A4 has two positive eigenvalues? Use the test, don’t compute 
the X’s: 


5 6 —] -2 1 10 1 10 
ais F J oe & e os E aad a= P A 
Explain why c > 0 (instead ofa > 0) combined with ac — b? > 0 is also a complete 
test for [4 ? | to have positive eigenvalues. 


2 For which numbers b and c are these matrices positive definite? 


l b 2 4 
A= = 
> J and A i j 


Factor each A into LU and then into LDLT. 


10 


11 
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What is the quadratic f = ax? + 2bxy + cy” for these matrices? Complete the 
square to write f as a sum of one or two squares d] ( )? +d ). 


1 2 1 2 
a=); : and a=; J 


Show that f(x, y) = x? +4xy +3y? does not have a minimum at (0, 0) even though 
it has positive coefficients. Write f as a difference of squares and find a point (x, y) 
where f is negative. 


The function f(x, y) = 2xy certainly has a saddle point and not a minimum at (0, 0). 
What symmetric matrix A produces this f? What are its eigenvalues? 


Test to see if ATA is positive definite: 


1 1 
A= Loz and A=|1 2 and A= a. a, 
0 3 >] 1 2 


(Important) If A has independent columns then ATA is square, symmetric and in- 
vertible (Section 4.2). Show why xTAT Ax is positive except when x = 0. Then 
ATA is more than invertible, it is positive definite. 


The function f(x, y) = 3(x + 2y)? + 4y? is positive except at (0, 0). What is the 
matrix A, so that f = [x y]A[x y]*? Check that the pivots of A are 3 and 4. 


Find the 3 by 3 matrix A and its pivots, rank, eigenvalues, and determinant: 
X1 


[ x1 X2 X3 ] A x2 | = 4(x1 — x2 + 2x3)? 
X3 


Which 3 by 3 symmetric matrices A produce these functions f = xTAx? Why is 
the first matrix positive definite but not the second one? 


(a) f= 2; + x4 + i5 — xx. — x2X3) 


b) f= 2(x? + x5 + a — Xj x2 — X1x3- x2x3). 


Compute the three upper left determinants to establish positive definiteness. Verify 
that their ratios give the second and third pivots. 


Z 2 0 
ASN D 3 
0 3 8 
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12 For what numbers c and d are A and B positive definite? Test the 3 determinants: 


c 1 2 3 

Ask c 1 and B=|2 d 4 
l c 3 4 5 

13 Find a matrix witha > 0 and c > 0 and a + c > 2b that has a negative eigenvalue. 

Problems 14-20 are about applications of the tests. 

14 IfA is positive definite then AT! is positive definite. First proof: The eigenvalues 


of A~! are positive because . Second proof (2 by 2): The entries of 


1 = 
AT! = P T l r pass the test 
C S =: 


15 IfA and B are positive definite, show that A + B is also positive definite. Pivots and 
eigenvalues are not convenient for this; better to prove x'(A + B)x > 0 from the 
positive definiteness of A and B. 


16 Fora block positive definite matrix, the upper left block A must be positive definite: 


A B 
[ x" AF A H reducesto xTAx when y= 


The complete block test is that A and C — B'A~!B must be positive definite. 


17 A positive definite matrix cannot have a zero (or worse, a negative number) on its 
diagonal. Show that this matrix is not positive definite: 


4 1 1 x] 
[xı x2 x3 | 1 0 2 x2 | is not positive when (x1,%2,x3)=( , ,  ). 
‘25 X3 


18 The first entry a;, of a symmetric matrix A cannot be smaller than all the eigenvalues. 
If it were, then A — a11 Z would have eigenvalues but it has a on the 
main diagonal. Similarly no diagonal entry can be larger than all the eigenvalues. 


19 Ifx is an eigenvector of A then x! Ax = . Prove that À is positive when A is 
positive definite. 


20 Give a quick reason why each of these statements is true: 


(a) Every positive definite matrix is invertible. 

(b) The only positive definite permutation matrix is P = 1. 

(c) The only positive definite projection matrix is P = I. 

(d) A diagonal matrix with positive diagonal entries is positive definite. 


(e) Asymmetric matrix with a positive determinant might not be positive definite! 
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Problems 21-24 use the eigenvalues; Problems 25-27 are based on pivots. 


21 


22 


23 


24 


25 


26 


27 


28 


For which s and ¢ do these matrices have positive eigenvalues (therefore positive 
definite)? 


S —4 —4 t 3 O 
A = | —4 s —4 and BS hs: 4 
—4 —4 s5 0 4 t 


From A = QAQ! compute the positive definite symmetric square root QA! QT 
of each matrix. Check that this square root gives R? = A: 


5 4 10 6 
a=]; | and a= a 


You may have seen the equation for an ellipse as C + (2) = 1. What are a and b 
when the equation is written as Ayx? + à2y? = 1? The ellipse 9x? + 16y? = 1 has 
axes with half-lengths a = and b = 


Draw the tilted ellipse x? + xy + y? = 1 and find the half-lengths of its axes from 
the eigenvalues of the corresponding A. 


With positive pivots in D, the factorization A = LDL! becomes LVDVDL'. 
(Square roots of the pivots give D = VDV D.) Then C = LV/D yields the Cholesky 
factorization A = CC!: 


3 0 4 8 
From C= k 4 find A. From A= p l find C. 
In the Cholesky factorization A = CC T with C = Ly D. the of the pivots 
are on the diagonal of C. Find C (lower triangular) for 
9 0 0 1 | 
A=/;0 1 2 and AS lb 2.2 
0 2 8 E27 


The symmetric factorization A = LDLT means that xT Ax = x! LDL'x. This is 


i mF ABLEb #1 ojo d f w ai 


Multiplication produces ax? + 2bxy + cy? = a(x + by)? + y?. The second 
pivot completes the square. Test witha = 2, b= 4, c = 10. 
Without multiplying A = [5082 -226 ][2 9][ S9 $26], fin 


(a) the determinant of A (b) the eigenvalues of A 


(c) the eigenvectors of A (d) areason why A is positive definite. 
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29 For fi (x, y) = txt 4+x2y+y? and fo(x, y) = x? +xy —x find the second derivative 
matrices A; and A2: 


O*f/ax*  a*f/axdy 
A= 
a°f/dydx  d°f/dy* 


A is positive definite so fı is concave up (= convex). Find the minimum point of fı 
and the saddle point of fə (where first derivatives are zero). 


30 The graph of z = x? + y? is a bowl opening upward. The graph of z = x? — y? isa 
saddle. The graph of z = —x* — y? is a bowl opening downward. What is a test for 
z = ax? + 2bxy + cy’ to have a saddle at (0, 0)? 


31 Which values of c give a bowl and which give a saddle for the graph of the function 
z = 4x? + 12xy + cy”? Describe the graph at the borderline value of c. 


6.5 Stability and Preconditioning 


Up to now, our approach to Ax = b has been “direct.” We accepted A as it came. We 
attacked it with Gaussian elimination. This section is about “iterative” methods, which 
replace A by a simpler matrix S. The difference T = S — A is moved over to the right side 
of the equation. The problem becomes easier to solve, with S instead of A. But there is a 
price—the simpler system has to be solved over and over. 

An iterative method is easy to invent. Just split A into $ — T. Then Ax = b is the 
same as 


Sx = Tx +b. (6.31) 
The novelty is to solve (6.31) iteratively. Each guess xg leads to the next xg4+1: 
SXxk+1 = Tx, +b. (6.32) 


Start with any x9. Then solve Sx; = Txo + b. Continue to the second iteration Sx2 = 
Tx, +b. A hundred iterations are very common—maybe more. Stop when (and if!) the 
new vector x;41 is sufficiently close to x,—or when the residual b — Ax, is near zero. We 
can choose the stopping test. Our hope is to get near the true solution, more quickly than by 
elimination. When the sequence xg converges, its limit x = Xoo does solve equation (6.31). 
The proof is to let k —> œ in equation (6.32). 

The two goals of the splitting A = S — T are speed per step and fast convergence 
of the x. The speed of each step depends on S and the speed of convergence depends 
on STIT: 


1 Equation (6.32) should be easy to solve for x41. The “preconditioner” S could be 
diagonal or triangular. When its LU factorization is known, each iteration step is 
fast. 
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2 The difference x — x, (this is the error ex) should go quickly to zero. Subtracting 
equation (6.32) from (6.31) cancels b, and it leaves the error equation: 


Sek+ı = Te, which means e4; = S7' Teg. (6.33) 


At every step the error is multiplied by S~!T. If STIT is small, its powers go quickly to 
zero. But what is “small”? 

The extreme splitting is S = A and T = 0. Then the first step of the iteration is the 
original Ax = b. Convergence is perfect and S~'T is zero. But the cost of that step is 
what we wanted to avoid. The choice of S is a battle between speed per step (a simple S) 
and fast convergence (S close to A). Here are some popular choices: 


(J) S = diagonal part of A (the iteration is called Jacobi’s method) 
(GS) S = lower triangular part of A (Gauss-Seidel method) 
(SOR) S = combination of Jacobi and Gauss-Seidel (successive overrelaxation) 
(ILU) S = approximate L times approximate U (incomplete LU method). 


Our first question is pure linear algebra: When do the x,’s converge to x? The 
answer uncovers the number |A|max that controls convergence. It is the largest eigenvalue 
of the iteration matrix S~!T. 


The Spectral Radius Controls Convergence 


Equation (6.33) is eg41 = S~!Te,. Every iteration step multiplies the error by the same 
matrix B = S~'T. The error after k steps is ex = B*eo. The error approaches zero if the 
powers of B = S~'T approach zero. It is beautiful to see how the eigenvalues of B—the 
largest eigenvalue in particular—control the matrix powers B*. 


Convergence The powers B* approach zero if and only if every eigenvalue of B satisfies 
|A| < 1. The rate of convergence is controlled by the spectral radius |À |max. 


The test for convergence is |A|\max < 1. Real eigenvalues must lie between —1 and 1. 
Complex eigenvalues à = a + ib must lie inside the unit circle in the complex plane. In 
that case the absolute value |A| is the square root of a? + b*. In every case the spectral 
radius is the largest distance from the origin O to the eigenvalues 4;,...,A,. Those are 
eigenvalues of the iteration matrix B = S~!T. 


To see why |A|max < 1 is necessary, suppose the starting error eo happens to be an eigen- 
vector of B. After one step the error is Beo = eo. After k steps the error is Been = Ake. 
If we start with an eigenvector, we continue with that eigenvector—and it grows or decays 
with the powers A*. This factor * goes to zero when |X| < 1. Since this condition is 
required of every eigenvalue, we need |A|max < 1. 

To see why |A|max < 1 is sufficient for the error to approach zero, suppose eo is a 
combination of eigenvectors: 


e0 = cCyX, +---+c,X, leadsto eg = cy ak xy +. ft Crk Xp. (6.34) 
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This is the point of eigenvectors! They grow independently, each one controlled by its 
eigenvalue. When we multiply by B, the eigenvector x; is multiplied by A;. If all |A;| < 1 
then equation (6.34) ensures that eg goes to zero. 


Examples B =[-6:2]hasAmax = 1.1, B’ =[-81:}] has Amax = .6. 
B? is 1.1 times B. Then B? is (1.1)? times B. The powers of B blow up. Contrast with 


the powers of B’. The matrix (B’)* has (.6)* and (.5)* on its diagonal. The off-diagonal 
entries also involve (.6)*, which sets the speed of convergence. 


Note There ts a technical difficulty when B does not have n independent eigenvectors. (To 
produce this effect in B’, change .5 to .6.) The starting error €eọ may not be a combination 
of eigenvectors—there are too few for a basis. Then diagonalization is impossible and 
equation (6.34) is not correct. We turn to the Jordan form: 


B = SJS! and = BE =SJESL. (6.35) 


The Jordan forms J and J* are made of “blocks” with one repeated eigenvalue: 


a 195 pak kak! 
The powers of a 2 by 2 block are k 4 =" ak | 
If |A| < 1 then these powers approach zero. The extra factor k from a double eigenvalue 
is overwhelmed by the decreasing factor A‘—!. This applies to all Jordan blocks. A larger 
block has k2A*~? in J*, which also approaches zero when |A| < 1. 
If all JA] < 1 then JE > O and BF — O. This proves our Theorem: Convergence 
requires |A|max < 1. 


We emphasize that the same requirement |A|max < 1 is the condition for stability of 
a difference equation Ug; = Aux: 


ug = Auo > 0 if and only if all |A(A)| < 1. 


LINEAR TRANSFORMATIONS 


7.1 The Idea of a Linear Transformation 


When a matrix A multiplies a vector v, it produces another vector Av. You could think of 
the matrix as “transforming” the first vector v into the second vector Av. In goes v, out 
comes Av. This transformation follows the same idea as a function. In goes a number x, 
out comes f(x). For one vector v or one number x, we multiply by the matrix or we 
evaluate the function. The deeper goal is to see the complete picture—all v’s and all Av’s. 

Start again with a matrix A. It transforms v to Av. It transforms w to Aw. Then we 
know what happens to u = v + w. There is no doubt about Au, it has to equal Av + Aw. 
Matrix multiplication gives a linear transformation: 


Definition A transformation T assigns an output T (v) to each input vector v. The trans- 
formation is linear if it meets these requirements for all v and w: 


(a) Tivu+w)=T(v)+T(w) 
(b) T(cv)= cT (v) for allc. 


If the input is v = 0, the output must be T (v) = 0. Very often requirements (a) and (b) are 
combined into one: 


Linearity: T(cv + dw) = cT (v)+ dT (w). 


A linear transformation is highly restricted. Suppose we add ug to every vector. Then 
T (v) = v + uo and T (w) = w+ uo. This isn’t good, or at least it isn’t linear. Applying T 
to v + w gives v + w + Uo. That is not the same as T (v) + T (w): 


T(v)+T(w) =v + uo + w + uo 


does not equal 
T(v + w) =v + w + uo. 


The exception is when ug = 0. The transformation reduces to T (v) = v. This is the 
identity transformation (nothing moves, as in multiplication by 7). That is certainly linear. 
In this case the input space V is the same as the output space W. 
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The transformation T (v) = v + uo may not be linear but it is “affine.” Straight 
lines stay straight, but lines through the origin don’t stay through the origin. Affine means 
“linear plus shift.” Its output can be Av + uo, for any matrix A. 

Computer graphics works with affine transformations. We focus on the Av part— 
which is linear. 


Example 7.1 Choose a fixed vector a = (1, 3, 4), and let T (v) be the dot product a - v: 
The inputis v = (v1, v2, v3). The outputis T(v) =a-+-v= v + 3v2 + 4v3. 


This is linear. The inputs v come from three-dimensional space, so V = R?. The outputs 
are just numbers, so we can say W = R!. We are multiplying by the row matrix A = 
[1 3 4]. Then T (v) = Av. 

You will get good at recognizing which transformations are linear. If the output 
involves squares or products or lengths, ve or v1 v2 or ||v||, then T is not linear. 


Example 7.2 T(v) = ||v|| is not linear. Requirement (a) for linearity would be ||v + w|| = 


|v || + Iwil. Requirement (b) would be ||cv|| = cllv||. The first is false (the sides of a 
triangle satisfy an inequality ||v + w|| < ||v|| + Iwll). The second is also false, when we 
choose c = —1. The length || — v|| is not —||v||. 


Example 7.3 (More important) T is the transformation that rotates every vector by 30°. 
The domain is the xy plane (where the input vector v is). The range is also the xy plane 
(where the rotated vector T(v) is). We described T without mentioning a matrix: just 
rotate the plane. 

Is rotation linear? Yes it is. We can rotate two vectors and add the results. Or we can 
first add the vectors and then rotate. One result T (v) + T (w) is the same as the other result 
T(v + w). The whole plane is turning together, in this linear transformation. 


Note ‘Transformations have a language of their own. Where there is no matrix, we can’t 
talk about a column space. But the idea can be rescued and used. The column space 
consisted of all outputs Av. The nullspace consisted of all inputs for which Av = 0. 
‘Translate those into “range” and “kernel”: 


Range of T = set of all outputs T (v): corresponds to column space 
Kernel of T = set of all inputs for which T (v) = 0: corresponds to nullspace. 


The range is a subspace of the output space W. The kernel is a subspace of the input 
space V. When T is multiplication by a matrix, T(v) = Av, you can translate back to 
column space and nullspace. We won’t always say range and kernel when these other 
words are available. 

For an m by n matrix, the nullspace is a subspace of V = R”. The column space is a 
subspace of ___. The range might or might not be the whole output space W. 


Examples of Transformations (mostly linear) 


Example 7.4 Project every 3-dimensional vector down onto the xy plane. The range is 
that plane, which contains every T(v). The kernel is the z axis (which projects down to 
zero). This projection is linear. 
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Example 7.5 Project every 3-dimensional vector onto the horizontal plane z = 1. The 
vector v = (x, y, Z) is transformed to T (v) = (x, y, 1). This transformation is not linear. 
Why not? 


Example 7.6 Multiply every 3-dimensional vector by a3 by 3 matrix A. This is definitely 
a linear transformation! 


T(v+w)=A(v+w) whichequals Av + Aw = T (v) + T (w). 


Example 7.7 Suppose T (v) is multiplication by an invertible matrix. The kernel is the 
zero vector; the range W equals the domain V. Another linear transformation is multipli- 
cation by A~!. This is the inverse transformation T~!, which brings every vector T (v) 
back to v: 


T. (T(v))=v matches the matrix multiplication A7!(Av) = 


We are reaching an unavoidable question. Are all linear transformations produced 
by matrices? Each m by n matrix does produce a linear transformation from R” to R”. 
The rule is T (v) = Av. Our question is the converse. When a linear T is described as a 
“rotation” or “projection” or ..., is there always a matrix hiding behind it? 

The answer is yes. This is an approach to linear algebra that doesn’t start with matri- 
ces. The next section shows that we still end up with matrices. 


Linear Transformations of the Plane 


It is more interesting to see a transformation than to define it. When a 2 by 2 matrix A 
multiplies all vectors in RŽ, we can watch how it acts. Start with a “house” in the xy plane. 
It has eleven endpoints. Those eleven vectors v are transformed into eleven vectors Av. 
Straight lines between v’s become straight lines between the transformed vectors Av. (The 
transformation is linear!) Therefore T (house) is some kind of a house—possibly stretched 
or rotated or otherwise unlivable. 

This part of the book is visual not theoretical. We will show six houses and the 
matrices that produce them. The columns of H are the eleven circled points of the first 
house. (H is 2 by 12, so plot2d connects the last circle to the first.) The 11 points in the 
house matrix are transformed by A x H to produce the other houses: 


ra 8 Sot 0 7 © 6343 0 T 
Wen Z I Re iy OSes a 


Problem Set 7.1 


1 A linear transformation must leave the zero vector fixed: T(0) = 0. Prove this 
from T(v + w) = T(v) + T(w) by choosing w = . Prove it also from 
requirement (b) by choosing c = 
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_ | cos 35° —sin 35° 


0 1 


-kd 
1 0 

a= [07 et 

0.3 0.7 
a=| 0.7 < 
-0.3 0.9 


Figure 7.1 Linear transformations of a house drawn by plot2d(A*H). 


2 Requirement (b) gives T (cv) = cT (v) and also T (dw) = dT (w). Then by addition, 
requirement (a) gives T( )=( ). What is T (cv + dw + eu)? 


3 Which of these transformations is not linear? The input is v = (v1, v2): 


(a) T@)=(v2,v) (6) T@)= (v, v) (©) T@)= (0,vı) 
(d T(x) = (0,1). 


10 


11 
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If S and T are linear transformations, is S(T (v)) linear or quadratic? 


(a) (Special case) If S(v) = v and T (v) = v, then S(T (v)) = v or v2? 
(b) (General case) S(w1 +w2) = S(w1) + S(w2) and T (vı +02) = T (v1) +T (v2) 
combine into 


S(T 0 + ¥2)) =S) = H 


Suppose T (v) = v except that T (0, v2) = (0,0). Show that this transformation 
satisfies T (cv) = cT (v) but not T (v + w) = T (v) + T (w). 


Which of these transformations satisfy T (v + w) = T (v) + T (w) and which satisfy 
T (cv) = cT (v)? 

(a) T@œ)= v/v (b) T(v) = vi +v2+03 (c) T(v) = (v, 2v2, 3v3) 
(d) T(v) = largest component of v. 

For these transformations of V = R? to W = Rĉ, find T(T (v)). Is this transforma- 
tion T? linear? 

(a) T(@)=—v (b) T@)=v+(1,1) 

(c) T(v)= 90° rotation = (—v2, v1) 


(d) T(v) = projection = (232, sgn), 
Find the range and kernel (like the column space and nullspace) of T: 


(a) T(v, v2) = (v2, v1) (b) T(vi, v2, v3) = (v1, v2) 
(c) T(v1, v2) = (0,0) (d) T(vq, v2) = (vı, vı). 
The “cyclic” transformation T is defined by T (v1, v2, v3) = (v2, v3, v1). What is 


T (T(v))? What is T? (v)? What is T! (v)? Apply T three times and 100 times 
to v. 


A linear transformation from V to W has an inverse from W to V when the range 
is all of W and the kernel contains only v = 0. Why are these transformations not 
invertible? 


(a) T(v, v2) = (v2, v2) W = R’ 
(b) T(v1, v) = (v, v2, v +v) W=R 
(c) T(v,v)=v W= R! 


If T (v) = Av and A is m by n, then T is “multiplication by A.” 


(a) What are the input and output spaces V and W? 
(b) Why is range of T = column space of A? 
(c) Why is kernel of T = nullspace of A? 
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Suppose a linear T transforms (1, 1) to (2, 2) and (2, 0) to (0, 0). Find T (v) when 


(a) v=(2, 2) (b) v= (3,1) (c) v=(-1,1) (d) v= (a,b). 


Problems 13-20 may be harder. The input space V contains all 2 by 2 matrices M. 


13 


14 


15 


16 


17 


18 


19 


M is any 2 by 2 matrix and A = [17]. The transformation T is defined by T (M) = 
AM. What rules of matrix multiplication show that T is linear? - 


Suppose A = [}2]. Show that the range of T is the whole matrix space V and the 
kernel is the zero matrix: 

(1) If AM =0 prove that M must be the zero matrix. 

(2) Finda solution to AM = B for any 2 by 2 matrix B. 


Suppose A = [} 2]. Show that the identity matrix Z is not in the range of T. Find a 
nonzero matrix M such that T(M) = AM is zero. 


Suppose T transposes every matrix M. Try to find a matrix A which gives AM = 
M! for every M. Show that no matrix A will do it. To professors: Is this a linear 
transformation that doesn’t come from a matrix? 


The transformation T that transposes every matrix is definitely linear. Which of these 
extra properties are true? 

(a) T*= identity transformation. 

(b) The kernel of T is the zero matrix. 

(c) Every matrix is in the range of T. 

(d) T(M) = —M is impossible. 

Suppose T(M) = [}8][m][89]. Find a matrix with T(M) # 0. Describe all 


matrices with T(M) = 0 (the kernel of T) and all output matrices T (M) (the range 
of T). 


If A Æ 0 and B + 0 then there is a matrix M such that AMB # 0. Show by example 
that M = I might fail. For your example find an M that succeeds. 


20 If A and B are invertible and T(M) = AMB, find T~!(M) inthe fom ( )M(_). 
Questions 21-27 are about house transformations by matrices. The output is T 
(house). 

21 How can you tell from the picture of T (house) that A is 


(a) a diagonal matrix? 
(b) arank-one matrix? 


(c) alower triangular matrix? 


22 


23 


24 


25 
26 
27 


28 


29 


30 
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Draw a picture of T (house) for these matrices: 


2 0 7.1 1 1 
p= and a= 4 and gele vt 


What are the conditions on A = | 4 £ | to ensure that T (house) will 


(a) sit Straight up? 
(b) expand the house by 3 in all directions? 


(c) rotate the house with no change in its shape? 
What are the conditions on det A = ad — bc to ensure that T (house) will 


(a) be squashed onto a line? 
(b) keep its endpoints in clockwise order (not reflected)? 


(c) have the same area as the original house? 

If one side of the house stays in place, how do you know that A = 1? 
Describe T (house) when T (v) = —v + (1, 0). This T is “affine.” 
Change the house matrix H to add a chimney. 


This MATLAB program creates a vector of 50 angles called theta, and then draws 
the unit circle and T (circle) = ellipse. You can change A. 


A = [2 1;1 2] 

theta = [0:2 * pi/50:2 » pil; 

circle = [cos(theta); sin(theta)]; 

ellipse = A * circle; 

axis([-4 4 —4 4]); axis(’square’) 

plot(circle(1,:), circle(2,:), ellipse(1,:), ellipse(2,:)) 


Add two eyes and a smile to the circle in Problem 27. (If one eye is dark and the 
other is light, you can tell when the face is reflected across the y axis.) Multiply by 
matrices A to get new faces. 


The first house is drawn by this program plot2d(H). Circles from o and lines from —: 


x = H(1,:)5 y = H(2,:); 
axis([-10 10 -10 10]), axis(’square’) 
plot(x,y,’0’,x,y,’—'); 


Test plot2d(A’ * H) and plot2d(A’ + A * H) with the matrices in Figure 7.1. 


Without a computer describe the houses A x H for these matrices A: 


1 0 55 5 5 1 1 
k A mg Ee d aig he 3 ane a 
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7.2 Choice of Basis: Similarity and SVD 


This section is about diagonalizing a matrix (any matrix). The example we know best is 
STIAS = A, for square matrices only. By placing the eigenvectors in S we produce the 
eigenvalues in A. You can look at this diagonalization in two ways, as a factorization of A 
or as a good choice of basis: 


- Factorization of the matrix: A = SAST! 
— Choice of eigenvector basis: The matrix becomes A. 


Chapter 6 emphasized the first way. Matrix multiplication gave AS = SA (each column is 
just Ax = Ax). This chapter is emphasizing the second way. We change from the standard 
basis, where the matrix is A, to a better basis. There have to be n independent eigenvectors, 
which are the input basis and also the output basis. Then output equals A times input. 

When A is symmetric, its eigenvectors can be chosen orthonormal. The matrix with 
those columns is called Q. The diagonal matrix A is Q~!AQ which is also QTA Q. The 
SAS7! factorization of a symmetric matrix becomes A = QA QT. 


Nothing is new in those paragraphs. But this section moves to something entirely new and 
very important. You know the Fundamental Theorem of Linear Algebra, which is true for 
every matrix. It involves four subspaces—the row space, the column space, and the two 
nullspaces. By the row operations of elimination, we produced bases for those subspaces. 
But those bases are not the best! They are not orthonormal and the matrix did not become 
diagonal. Now we choose the best bases. 


We will explain the new result in two ways, first the bases and then the factorization. Every 
m by n matrix is allowed. 


7C There are orthonormal bases v1, ..., v, for the row space and u1, ..., u, for the col- 
umn space such that Av; = o;u;. We can also ensure that o; > 0. 


7D Singular Value Decomposition Every m by n matrix can be factored into A = 
UxV!, where U and V are orthogonal matrices and E is diagonal: 


T 
o1 
A=Uxvi= uy uy Um v1 Vr Vn 
Or 
mbym mbyn nbyn 
The matrix & has the “singular values” o1,..., 0, on its diagonal and is otherwise zero. 


Compare with the symmetric case A = QA QT. The orthogonal matrices U and V 
are no longer the same Q. The input basis is not the same as the output basis. The input 
basis starts with v,;,..., v, from the row space. It finishes with any orthonormal basis 
Vr41,---, Vn for the nullspace. Similarly the output basis starts with the good u1, ..., u, 
in the column space and ends with any orthonormal u;4+1,..., Um in the left nullspace. 
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You are seeing again the four dimensions r and n — r and r and m — r. The orthog- 
onality of row space to nullspace is here too: v1,..., v,; are automatically orthogonal to 
Vr+1,---+, Un. We are taking the final step in the Fundamental Theorem of Linear Algebra: 
To choose bases that make the matrix diagonal. 

Important point: The singular values o; are not eigenvalues of A. In fact o? is an 
eigenvalue of ATA (and also of AAT). Those matrices are symmetric. Their orthogonal 
eigenvectors are the v’s and u’s. 


This is an extra long section of the book. We don’t expect the first course to reach this 
far. But the SVD is absolutely a high point of linear algebra—the dimensions are right 
and the orthogonality is right and now the bases are right. The Fundamental Theorem of 
Linear Algebra is complete. 


Similarity: M-!AM and S-!AS 


We begin with a square matrix and one basis. The input space V is R” and the output 
space W is also R”. The basis vectors are the columns of J. The matrix with this basis is 
n by n, and we call it A. The linear transformation is just “multiplication by A.” 

Most of this book has been about one fundamental problem—to make the matrix 
simple. We made it triangular in Chapter 2 (by elimination), and we made it diagonal in 
Chapter 6 (by eigenvectors). Now a change in the matrix comes from a change of basis. 

Here are the main facts in advance. When you change the basis for V, the matrix 
changes from A to AM. Because V is the input space, the matrix M goes on the right (to 
come first). When you change the basis for W, the new matrix is M~!A. We are dealing 
with the output space so M~! is on the left (to come last). If you change both bases in 
the same way, the new matrix is M~1AM. The good basis vectors are the eigenvectors, 
which go into the columns of M = S. The matrix becomes S~'AS = A. 


7E When the basis consists of the eigenvectors x1, ..., Xn, the matrix for T becomes A. 


Reason To find column 1 of the matrix, input the first basis vector x1. The transformation 
multiplies by A. The output is Ax; = 1x1. This is A; times the first basis vector plus zero 
times the other basis vectors. Therefore the first column of the matrix is (A;, 0,..., 0). In 
the eigenvector basis, the matrix is diagonal. 


Example 7.8 Find the eigenvector basis for projection onto the 135° line y = —x. The 
standard vectors (1, 0) and (0, 1) are projected in Figure 7.2. In the standard basis 


Soas 
ia 5 A 


Solution The eigenvectors for this projection are xı = (1, —1) and x2 = (1, 1). The first 
is on the projection line and the second is perpendicular (Figure 7.2). Their projections are 
xı and 0. The eigenvalues are 4; = 1 and Az = 0. In the eigenvector basis the projection 


matrix is 
1 O 
A= : 
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E 0 
Ne H 


projects 


projection T S to zero 


projection| j 
—5 projects to x, 


Figure 7.2 Projection on the 135° line y = —x. Standard basis vs. eigenvector basis. 


What if you choose a different basis vı = (2,0) and v2 = (1, 1)? There are two 
ways to find the new matrix B, and the main point of this page is to show you both ways: 


First way Project vı to reach (1, —1). This is vı — v2. The first column of B contains 
these coefficients 1 and —1. Project vz to reach (0, 0). This is Ov; + 0v2. Column 2 
contains 0 and 0. With basis vj = (2, 0) and v = (1, 1) the matrix from Figure 7.3 is 


Second way Find the matrix B in three steps. Change from the v’s to the standard basis, 
using M. Project in that standard basis, using A. Change back to the v’s with M7!: 


—1 
By’stov’s = M standard to v’s Agtandard My’s to standard 


The change of basis matrix M has the v’s in its columns. Then M~! AM is B: 


p-|> 75 SS —S5]}2 1) 7 1 O 
10 1s{-5 S}]0 1j [-1 of 
Conclusions The matrix B is the same both ways. The second way shows that B = 
M~'AM. Then B is similar to A. B and A represent the same transformation T—in this 
case a projection. M and M~! only represent the identity transformation 7. (They are 


not identity matrices! Their input and output bases are different.) The product of matrices 
M~!AM copies the product of transformations IT I. 


7F Inone basis w1,..., w, the transformation T has matrix A. In another basis vj, ..., Vn 
the same transformation has matrix B. The identity transformation J from v’s to w’s has 
the matrix M (change of basis matrix). Then B is similar to A: 


Ty tov =lLwtov Twtow lvtow leadsto B = M~'AM. 


It can be shown that A and B have the same eigenvalues and the same determinant. 
A is invertible when B is invertible (and when T is invertible). 
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a hg ary e 


projection 7(v,) = | l =0 -v M then A then M~! yields B 


T(v,) = 0 x j 


Figure 7.3 Projection matrix with a new basis. First way: Stay with v’s. Second way: 
Go to the standard basis and back by M~!AM. 


Suppose the v’s are the eigenvectors and the w’s are the standard basis. The change 
of basis matrix M is S. Its columns are the eigenvectors written in the standard basis! Then 
the similar matrix B = M~! AM is the diagonal matrix A = S~'AS. 


Example 7.9 T reflects every vector v across the straight line at angle 6. The output T (v) 
is the mirror image of v on the other side of the line. Find the matrix A in the standard 
basis and the matrix A in the eigenvector basis. 


Solution The eigenvector vı = (cos 0, sin) is on the line. It is reflected to itself so 
A, = 1. The eigenvector v2 = (— sin, cos@) is perpendicular to the line. Its reflection 
is —v2 on the other side. In this basis the matrix is 


1 0 
yxk 2] 


Now use the standard basis (1,0) and (0, 1). Find A by going to the v’s and back. The 
change of basis matrix is M = S. Its columns contain the v’s. Then A is MBM! = 
SAS7!: 

e gael f 4 cos 0 al ier —sin?@ 2sin@cosé@ | 


sin cos@||0O —1]]—sin@ cosé = 2 sin cos 08 sin? 0 — cos? 0 
(7.1) 


With the identities for cos20 and sin20 we can recognize the reflection matrix of Sec- 
tion 6.1: 


(7.2) 


_ | cos20 sin 26 
~ | sin20 —cos2é |` 


This matrix has A* = J. Two reflections bring back the original. 


The Singular Value Decomposition (SVD) 


Now comes a highlight of linear algebra. A is any m by n matrix. It can be square or 
rectangular, and we will diagonalize it. Its row space is r-dimensional (inside R”) and its 
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A(w.) = |©9S A 
( i bes 20 


-line V; COS 6 


in 6-line 


—v, sin 0 


Figure 7.4 Column 1 of the reflection matrix: one step (standard basis) or three steps 
(to v’s and back). 


column space is r-dimensional (inside R”). We are going to choose orthonormal bases for 
those spaces. The row space basis will be v1, ..., v, and the column space basis will be 
ül, ..., Up. 

Start with a 2 by 2 matrix: m = n = 2. Let its rank be r = 2, so it is invertible. Its 
row space is the plane R? and its column space is also the plane R?. We want v and v to 
be perpendicular unit vectors, and we want u1 and u2 to be perpendicular unit vectors. As 
a specific example, we work with 

A = l 2 | 
—1 1 


First point Why not choose the standard basis? Because then the matrix is not diagonal. 


Second point Why not choose the eigenvector basis? Because that basis is not orthonor- 
mal. 


We are aiming for orthonormal bases that also diagonalize A. The two bases will be 
different—one basis cannot do it. When the inputs are vı and v2, the outputs are Av; and 
Av 2. We want those to line up with uw; and u2. The basis vectors have to give Avı = 0, uy 
and also Av? = 02u2. With those vectors as columns you can see what we are asking for: 


Aly v2]=[ou ozu] = [u m]|” a (7.3) 


In matrix notation that is AV = UX. The diagonal matrix © is like A (capital sigma 
versus capital lambda). One contains the singular values o1, o2 and the other contains the 
eigenvalues 1, Az. 

The difference comes from U and V. When they both equal S, we have AS = 
SA which means STAS = A. The matrix is diagonalized but the eigenvectors are not 
generally orthonormal. The new requirement is that U and V must be orthogonal matrices. 
The basis vectors in their columns must be orthonormal: 


viva] 5 |i =o 1) 


Thus VTV = J which means VT = V—!, Similarly UTU = I and UT = U™!. 
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7G The Singular Value Decomposition (SVD) has AV = UX with orthogonal matrices 
U and V. Then 


A=UxV!=UXV!. (7.4) 
This is the new factorization: orthogonal times diagonal times orthogonal. 


We have two matrices U and V instead of one matrix S. But there is a neat way to get U 
out of the picture and see V by itself: Multiply A! times A. 


ATA =(UEV')F(UsV!) = VETUTUEV]. (7.5) 


UTU disappears because it equals 7. Then £T is next to £. Multiplying those diagonal 
matrices gives o? and Gs. That leaves an ordinary factorization of the symmetric matrix 
ATA: 


2 
04 


2 
o 0 
ATA =V k | vi. (7.6) 


In Chapter 6 we would have called this Q A QT. The symmetric matrix was A itself. Now 
the symmetric matrix is ATA! And the columns of V are its eigenvectors. 

This tells us how to find V. We are ready to complete the example. 
Example 7.10 Find the singular value decomposition of A = E 1 


Solution Compute ATA and its eigenvectors. Then make them unit vectors: 


ATA = E 4 has eigenvectors vı = ee and v = ka ; 


The eigenvalues of ATA are 2 and 8. The v’s are perpendicular, because eigenvectors of 
every symmetric matrix are perpendicular—and ATA is automatically symmetric. 

What about u and u2? They are quick to find, because Av; is in the direction of u1 
and Av is in the direction of u2: 


Avı = E 4 Pe = Ji The unit vectoris uy = He 


1 1}} 1/72 a2 
Av) = & i ed = = The unit vector is u2 = A ; 


Since the eigenvalues of ATA are o? = 2 and o? = 8, the singular values of A are their 
square roots. Thus o} = /2. In the display above, Avı has that factor /2. In fact 
Av, = ðu; exactly as required. Similarly o2 = J/8 = 2/2. This factor is also in the 
display to give Av} = 02u2. We have completed the SVD: 


asoan (2 Ee IP? wal | om 


This matrix, and every invertible 2 by 2 matrix, transforms the unit circle to an ellipse. 
You can see that in Figure 7.5. 
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Figure 7.5 U and V are rotations and reflections. & is a stretching matrix. 


One final point about that example. We found the u’s from the v’s. Could we find 
the u’s directly? Yes, by multiplying AA! instead of ATA: 


AA! =(UEV')\V=luUt) = USSTU". (7.8) 


This time it is VTV = / that disappears. Multiplying XX? gives ø? and o? as before. We 
have an ordinary factorization of the symmetric matrix AAT. The columns of U are the 
eigenvectors of AAT. 


Example 7.11 Compute the eigenvectors u; and uz directly from AAT. The eigenvalues 
are again of = 2 and oF = 8. The singular values are still their square roots: 


T 2 Di 2l 8 0 
ii =|; aie oe AF 
This matrix happens to be diagonal. Its eigenvectors are (1, 0) and (0, 1). This agrees with 
uı and u found earlier, but in the opposite order. Why should we take u; to be (0, 1) 
instead of (1, 0)? Because we have to follow the order of the eigenvalues. 
We originally chose o? = 2. The eigenvectors vı (for ATA) and u; (for AAT) have 
to stay with that choice. If you want the o’s in decreasing order, that is also possible (and 


generally preferred). Then of = 8 and os = 2. This exchanges vı and v2 in V, and it 
exchanges uw; and u2 in U. The new SVD is still correct: 


Hib Ps alia ia). 


The other small bit of freedom is to multiply an eigenvector by —1. The result is still a unit 
eigenvector. If we do this to vı we must also do it to uj —because Avı = 0 uy. It is the 
signs of the eigenvectors that keep o; positive, and it is the order of the eigenvectors that 
puts oj first in È. 


A=UZV! isnow E 
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—_—_—_— _ 


row space _— — 


nullspace 


nullspace of AT 


Figure 7.6 The SVD chooses orthonormal bases so that Av; = oj Uj. 


Example 7.12 Find the SVD of the singular matrix A = [77]. The rank is r = 1. The 
row space has only one basis vector vı. The column space has only one basis vector u1. 
We can see those vectors in A, and make them into unit vectors: 


À l i {1 
vı = multiple of row H =a a 


; 2 1 | 2 
u; = multiple of column H = H 


Then Avı must equal o,u,. It does, with singular value 0, = /10. The SVD could stop 
there (it usually doesn’t): 


P22] 2 [2/95] pva ua 


It is customary for U and V to be square. The matrices need a second column. The 
vector v2 must be orthogonal to vı, and u2 must be orthogonal to u4: 


_i1]1 _1] l 
v = F E and Uu = We E 


The vector v2 is in the nullspace. It is perpendicular to vı in the row space. Multiply by A 
to get Av = 0. We could say that the second singular value is o2 = 0, but this is against 
the rules. Singular values are like pivots—only the r nonzeros are counted. 

If A is 2 by 2 then all three matrices U, £, V are 2 by 2 in the true SVD: 


ae t_1 [2 I]fv10 0], fi 1 
f |= uzv = Jel a 0 olal ai we 


The matrices U and V contain orthonormal bases for all four fundamental subspaces: 


first r columns of V: row space of A 
last n-—r_columnsofV: nullspace of A 
first r columns of U : column space of A 


last m—r columnsofU:  nullspace of A". 
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The first columns v1, ..., v, and u1, ..., u, are the hardest to choose, because Av; has to 
fall in the direction of u;. The last v’s and w’s (in the nullspaces) are easier. As long as 
those are orthonormal, the SVD will be correct. The v’s are eigenvectors of ATA and the 
u’s are eigenvectors of AA!. This example has 


3 2 8 4 
Tisa Be 
aa =|? A and AA al; a 


Those matrices have the same eigenvalues 10 and 0. The first has eigenvectors vı and v9, 
the second has eigenvectors uı and u2. Multiplication shows that Avı = ~v 10 u; and 
Av = 0. It always happens that Av; = o;u;, and we now explain why. 
Starting from A! Av; = o? vi, the two key steps are to multiply by vi and by A: 
vi ATAv; = 070} 0; gives ||Av;l? =0? sothat ||Av;|| = 9; (7.10) 


AA‘ Ay; = o? Av; gives u; = Av;/o; as a unit eigenvector of AA‘. (7.11) 


Equation (7.10) used the small trick of placing parentheses in (vT AT) (Avi). This is 
a vector times its transpose, giving || Av; I2. Equation (7.11) placed the parentheses in 
(AAT)(Avi). This shows that Av; is an eigenvector of AAT. We divide it by its length o; 
to get the unit vector u; = Av;/o;. This is the equation Av; = oju;, which says that A is 
diagonalized by these outstanding bases. 


We will give you our opinion directly. The SVD is the climax of this linear algebra course. 
We think of it as the final step in the Fundamental Theorem. First come the dimensions of 
the four subspaces. Then their orthogonality. Then the bases which diagonalize A. It is all 
in the formula A = UX VT. Applications are coming—they are certainly important!—but 
you have made it to the top. 


Polar Decomposition and SVD Applications 


Every complex number has the polar form ret. A nonnegative number r multiplies a 
number on the unit circle. (Remember that |e’°| = |cos 6 + isin6| = 1.) Thinking of 
these numbers as 1 by 1 matrices, r > 0 corresponds to a positive semidefinite matrix 
(call it H) and et? corresponds to an orthogonal matrix Q. The SVD extends this re’® 
factorization to matrices (even m by n with rectangular Q). 


7H Every real square matrix can be factored into A = QH, where Q is orthogonal and 
H is symmetric positive semidefinite. If A is invertible then H is positive definite. 


For the proof we just insert VTV = J into the middle of the SVD: 
A = UEV! =(UV')\(V=XV') =(Q)(A). (7.12) 


The first factor U VT is Q. The product of orthogonal matrices is orthogonal. The second 
factor VEVT is H. It is positive semidefinite because its eigenvalues are in E. If A is 
invertible then H is also invertible—it is symmetric positive definite. H is the square root 
of ATA. Equation (7.5) says that H? = VE? VT = ATA. 
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There is also a polar decomposition A = K Q in the reverse order. Q is the same but 
now K = UXU!. This is the square root of AA! by equation (7.8). 


Example 7.13 Find the polar decomposition A = QH from the SVD in Example 7.11: 


E J-E I aA ti-e 


Solution The orthogonal part is Q = UVT. The positive definite part is H = VEVT = 
OA: 


eft bie walkie A 


[UA WAIL EA A 


In mechanics, the polar decomposition separates the rotation (in Q) from the stretching 
(in H). The eigenvalues of H are the singular values of A; they give the stretching factors. 
The eigenvectors of H are the eigenvectors of A! A; they give the stretching directions (the 
principal axes). 

The polar decomposition just splits the key equation Av; = o;u; into two steps. The 
“H” part multiplies v; by o;. The “Q” part swings the v direction around to the u direction. 
The other order A = K Q swings v’s to u’s first (with the same Q). Then K multiplies u; 
by o; to complete the job of A. 


The Pseudoinverse 


By choosing good bases, the action of A has become clear. It multiplies v; in the row space 
to give o;u; in the column space. The inverse matrix must do the opposite! If Av = ou 
then A~!u = v/o. The singular values of A~! are 1/o, just as the eigenvalues of A~! 
are 1/2. The bases are reversed. The u’s are in the row space of A~!, the v’s are now in 
the column space. 

Until this moment we would have added the words “if A7! exists”. Now we don’t. 
A matrix that multiplies u; to produce v; /a; does exist. It is denoted by AT: 
a i 


At = VEU! =| v ... » ... V Uy... U, ... Um 


n byn n by m m by m 
(7.13) 


A* is the pseudoinverse of A. It is an n by m matrix. If AT! exists (we said it again), 
then At is the same as A~!. In that case m = n = r and we are inverting U EVT to get 
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nullspace 
of AT 


Figure 7.7 A is invertible from row space to column space. At inverts it. 


Vx=~—!UT. The new symbol At is needed when r < m orr < n. Then A has no two-sided 
inverse but it has a pseudoinverse At with these properties: 


Atu; = ty; for i <r and Atu; =0 fo i>r. 
I 
When we know what happens to each basis vector u;, we know At. The vectors t1, ..., uy 
in the column space of A go back to the row space. The other vectors u;41,..., Um are in 


the left nullspace, and A*t in Figure 7.7 sends them to zero. 


Example 7.14 The pseudoinverse of A = E “all is At = A™!, because A is invertible. 
Inverting UXV! is immediate from Example 7.11 because UT! = UT and E is diagonal 


and (VT)! = V: 
z g 1/4 —1/2 
At 1 177T 
SAS Vy U s a A 


Example 7.15 Find the pseudoinverse of A = [77]. This matrix is not invertible. The 
rank is 1. The singular value is /10 from Example 7.12. That is inverted in £7: 


1 1771/10 0 2 1 2 4 
+ +yT— 1 l l 
ee ae all 0 oli a blz i 


A* also has rank 1. Its column space is the row space of A = [ 7 + |. When A takes (1, 1) 
in the row space to (4, 2) in the column space, A+ does the reverse. Every rank one matrix 
is a column times a row. With unit vectors u and v, that is A = ouv!. Then the best 


inverse we have is At = tvu. 
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The product AA* is uu!, the projection onto the line through u. The product At A 
is vv!, the projection onto the line through v. For all matrices, AA+ and ATA are the 
projections onto the column space and row space. 

Problem 19 will show how xt = Atb is the shortest least squares solution to 
Ax = b. Any other vector that solves the normal equation ATA = ATb is longer than x*. 


Problem Set 7.2 


Problems 1-6 compute and use the SVD of a particular matrix (not invertible). 
1 Compute ATA and its eigenvalues and unit eigenvectors vı and v2: 
1 2 
A= : 
[3 6| 
What is the only singular value o1? The rank of A isr = 1. 


2 (a) Compute AA! and its eigenvalues and unit eigenvectors u; and u2. 
(b) Verify from Problem 1 that Avı = 0,u,. Put numbers into the SVD: 


Eel «Jl? oly») 


3 From the u’s and v’s write down orthonormal bases for the four fundamental sub- 
spaces of this matrix A. 


4 Draw a picture like Figure 7.5 to show the three steps of the SVD for this A. 


5 From U, V, and È find the orthogonal matrix Q = U V! and the symmetric matrix 
H = V&V". Verify the polar decomposition A = QH. This H is only semidefinite 
because 


6 Compute the pseudoinverse At = V EUT. The diagonal matrix £* contains 1/0}. 
Rename the four subspaces (for A) in Figure 7.7 as four subspaces for At. Compute 
At A and AA™. 


Problems 7-11 are about the SVD of an invertible matrix. 


7 Compute ATA and its eigenvalues and unit eigenvectors v; and v2. What are the 
singular values o1 and o> for this matrix A? 


3 3 
A= 
ii 
8 AAT has the same eigenvalues o? and os as A! A. Find unit eigenvectors u1 and u2. 
Put numbers into the SVD: 


HL eb lt alle ol 
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10 


11 
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In Problem 8, multiply columns times rows to show that A = oyuyvt + o2U2vs 


Prove from A = U EV" that every matrix of rank r is the sum of r matrices of rank 
one. 


From U, V, and © find the orthogonal matrix Q = UV! and the symmetric matrix 
K =UXU‘. Verify the polar decomposition in the reverse order A = K Q. 


The pseudoinverse of this A is the same as because 


Problems 12-13 compute and use the SVD of a 1 by 3 rectangular matrix. 


12 


13 


14 


15 


16 


Compute ATA and AA! and their eigenvalues and unit eigenvectors when the matrix 
is A=[3 4 0]. What are the singular values of A? 


Put numbers into the singular value decomposition of A: 


RS Ahh tayo 0)» 7 mi 


Put numbers into the pseudoinverse of A. Compute AA and ATA: 


1/01 
At = = |v 2 V3 0 [ui]. 


What is the only 2 by 3 matrix that has no pivots and no singular values? What is & 
for that matrix? A* is the zero matrix, but what shape? 


If det A = 0 how do you know that det AT = 0? 


When are the factors in UXV! the same as in QA QT? The eigenvalues A; must be 
positive, to equal the o;. Then A must be and positive 


Questions 17-20 bring out the main properties of At and xt = Atb. 


17 


18 


In Example 6 all matrices have rank one. The vector b is (b1, b2). 
12 2 ie 5 [aes ol + |8 4 r, |5 5 
=i; T ġ =|3 A ae =|" eat aa =|? | 


(a) The equation A! AX = ATb has many solutions because A! A is 
(b) Verify thatx+ = Atb = (.2b1+.1b2, .2b1+.1b2) does solve ATAx* = ATb. 


(c) AAT projects onto the column space of A. Therefore projects onto the 
nullspace of AT. Then A'(AA*+ — I)b = 0. Then ATAx+t = ATb and & can 
be xt. 
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19 The vector x+ is the shortest possible solution to ATA = A'b. The difference 
x — xt is in the nullspace of ATA. This is also the nullspace of A (see 4C). Explain 
how it follows that 


a2 2 a 2 
IEE = lxt + 8 te. 
Any other solution £ has greater length than x*. 


20 Every b in R” is p + e. This is the column space part plus the left nullspace part. 
Every x in R” is x, +x, = (row space part) + (nullspace part). Then 


AAt p= , AAte= , AtAx, = , ATAX, = 


21 Find At and At A and AA? for the 2 by 1 matrix whose SVD is 
3 6 —.8])5 
a=[a]= [s allo] 


Questions 21-23 are about factorizations of 2 by 2 matrices. 


22 A general 2 by 2 matrix A is determined by four numbers. If triangular, it is deter- 
mined by three. If diagonal, by two. If a rotation, by one. Check that the total count 
is four for each factorization of A: 


LU, LDU, QR, Uxv!, SAS7!. 
23 Following Problem 22, check that LDLT and QAQ! are determined by three num- 
bers. This is correct because the matrix A is 


24 A new factorization! Factor [2 A into A = EH, where E is lower triangular with 
1’s on the diagonal and H is symmetric. When is this impossible? 


Part Il 


Geodesy 


LEVELING NETWORKS 


8.1 Heights by Least Squares 


Our first example in geodesy is leveling—the determination of heights. The problem is to 
find the heights x1, ..., x, at n specified points. What we actually measure is differences 
of heights. The height at point i is measured from point j, to give a value b;; (probably not 
exact) for the height difference: 


x; — x; = bj; — error. (8.1) 
j j 


These differences are measured for certain pairs i, j. From the measurements b;; we are 
to estimate the actual heights. 

Suppose first that there are no errors in the measurements. Then we expect to solve 
the equations exactly. But if you look at the equations for n = 3 points and m = 3 
measurements, you will see a difficulty: 


xi —X2 =bn 
x2 — x3 = b23 (8.2) 
x3 — x1 = 53). 


This system of equations is singular. Its coefficient matrix is 


1 —1 0 
A=] 0 1 -i 
—1 O 1 


The rows of A add to the zero row. The matrix is not invertible. The determinant of A 
has to be zero (we refuse to compute determinants). When we add the three equations, the 
result on the left side is zero: 


0 = bin + b23 + b31. (8.3) 
A singular system of equations has two possibilities, no solution or too many: 


1 There is no solution. The measurements b12, b23, b31 do not add to zero and the 
three equations are inconsistent. 
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2 The equations are consistent but the solution x;, x2, x3 is not unique. There are 
infinitely many solutions when the consistency condition in equation (8.3) is met. 


For measurements with errors, we expect to be in case 1: no solution. For exact mea- 
surements we must be in case 2: many solutions. This is our situation, and the reason is 
clear: 


We cannot determine absolute heights purely from height differences. One or 
more of the heights x; must be postulated (given a priori). Each fixed height 
is removed from the list of unknowns. 


Suppose the third height is fixed at x3 = H. Our equations become 


xı —X2 = by2 
x2 = bz + H (8.4) 
—x] = b3, — H. 


Now we have three equations and only two unknowns. Notice that they add to the same 
consistency equation 0 = b12 + b23 + b31. There are still two possibilities but the second 
is different because the matrix is different: 


1 There is no solution (the measurements are not consistent). 
2 There is exactly one solution (the consistency equation holds). 


In the language of linear algebra, we have a 3 by 2 matrix. The third column corresponding 
to x3 has been removed. The two remaining columns are independent: 


Areduced = 0 


The rank is 2 (full column rank). There is either no solution or one solution. The nullspace 
of A contains only the zero vector. 


Remark 8.1 Our problem is closely parallel to computing voltages (instead of heights) in 
an electrical circuit. The consistency equation 0 = b12 + b23 + b31 1s Kirchhoff’s Voltage 
Law: The differences around a loop add to zero. The fixed height x3 = H is like a fixed 
voltage, which allows the other voltages to be uniquely determined. Postulating x3 = 0 is 
“srounding a node.” 

Section 8.4 will develop further the analogy between heights in leveling networks 
and voltages in electrical networks. This viewpoint is important. It places geodesy into the 
basic framework of applied mathematics. 


For exact measurements, consistency will hold. We can solve two of the equations 
for xı and x2, and the third equation automatically follows. This is the nice case but in 
practice it almost never happens. 


8.1 Heights by Least Squares 277 


For measurements with errors, we expect the three equations in (8.4) to be incon- 
sistent. They cannot be solved. We look for a “best” solution, which makes an agreed 
measure E of overall system error as small as possible. The solution is best for that error 
measure E. For least squares, we minimize the sum of squares from the m equations: 


E? =r? +r? +r? = (b — x1 + x2)" + (b3 + H — x2)? + (b31 — H +21)”. 


This will be our starting point: ordinary least squares. It is not our finishing point. Usually 
it is not agreed that this Æ is the error measure that should be minimized. Other error 
measures give other “best” solutions and here are three of the most important: 


2 ri ra 13 ‘ohted [2 
A = 9 Foe E (weighted /“ norm) 
GO, 9 3 
1 
Esum = |r1| + [ral + Irs| (l> norm) 
Emax = maximum of {|r;|, |r2|, |r3|} (1° norm). 


Most of our attention will go to weighted least squares. We must explain why the particular 


ê A 2 . e A A 
weights are chosen in Efe kied and how they affect the estimated solution x1, x2. 


The quantities oe, os, os are variances. They measure the reliabilities of the three 


measurements. More reliable measurements have smaller variances and larger weights 
(because the weight 1/a? is the reciprocal of the variance). Equations that are weighted 


. e. . 2 
more heavily are solved more exactly when we minimize the overall error E weighted’ 


Remark 8.2 The next chapter gives a detailed discussion of variances and covariances. 
This link to statistics is essential. The useful output from our problem should be the height 
estimates x; and also an indication of their reliability. We want to know the variances of 
the output errors X; — x;, given the variances of the input measurement errors r;. It will 
be proved that the output variances are smallest when the weights are reciprocals of the 
input variances. That is the reason for the weights 1/0?. 

More generally, the optimum weight matrix is the inverse of the covariance matrix. 


Remark 8.3 The error measures Esym = X |r;| and Emax = |ri |max are not quadratic (and 
not even differentiable) because of the corners in the absolute value function. Minimization 
gives piecewise linear instead of linear equations. Thus Esum and Emax lead to linear 
programming, in which a subset of the equations Ax = b holds exactly. The difficult 
problem is to find that subset. The simplex method is quite efficient as a direct method. 
Iterative methods use weighted least squares at each linear step, with weights taken from 
the preceding iteration (this is called downweighting in geodesy). 

Emax is almost never used but Esum is very helpful. It is more robust than least 
squares, and less willing to conform to wild measurements (outliers). In practice there are 
almost always gross errors among the measurements b;;. Observations get wrongly identi- 
fied; numbers are wrongly reproduced. Some geodesists estimate that 5% of their data are 
infected. A least-squares fit will smooth over these errors too successfully. By minimizing 
Exum instead of E, the gross errors can be identified in the residual b — Ax. Those in- 
correct observations are removed from the data before computing the final estimate ¥ that 


te š 2 
minimizes E weighted’ 
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Completion of the Example 


We return to the three equations for two unknowns: 
xı — x2 = by2 
x2 = b3 + H (8.5) 
—x] = b3, — H. 
This is our system Ax = b. It will be solved by least squares, and also by weighted least 
squares. We will choose the weights that are appropriate in this leveling problem. The 


example is small enough to carry through in full detail. 
The matrix A and the right side b are 


1 -1 bi2 
A=] 0 1 and b = | b3 + H 
—1 0 bz; — H 


For ordinary least squares, with unit weights, the normal equation is ATA = ATb. Its 
solution is the estimate x = (£1, £2) of the unknown heights at the first two observation 
sites. The third height was postulated as x3 = H. 

Multiply A£ = b by the 2 by 3 matrix A! to find the equation A'AX = A'D: 


2 —1ļ|| x biz — b31 + H 
Hac , 8.6 
2 J is fs — b2 + H (5:9) 
This matrix ATA has the properties we expect: It is symmetric and it is invertible. The 


third column of A was removed by postulating x3 = H, leaving two independent columns. 
The inverse of ATA would not be computed in large problems, but here it is easy to do: 


HET oan 8.7) 
x2 1 2)| 623 —bi2 + H 
This gives the unweighted least-squares estimates 
ĉi = 4(b12 + bz — 2b31) + H 
x2 = 4(—bi2 — b31 + 2b23) + H. 


Note how all heights are raised by the same amount H. By fixing x3 = H, we set the 
“arbitrary constant.” If heights were measured from a different sea level, all components 
of £ would go up or down together. 

Notice also the possibility that the original equations are consistent: b12 + b23 + 
b3; = 0. In that case the estimate £ is also the genuine solution x. It is the unique solution 
to Ax = b. Replacing bj2 + b23 + b31 by zero gives the exact solution when it exists: 


(8.8) 


xı = —b3, + H 
x2 = b33 + H (8.9) 
13 =f, 


Again all heights move together with H. But least squares is telling us that (8.8) is a better 
estimate than (8.9) when the equations are inconsistent. 
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Weighted Least Squares 


Change from the unweighted to the weighted error measure: 


Pia DG AO G9 2 ri r3 r3 
Ee =r; +r +73 becomes E weighted = > t > t-35. 
O O3 03 


The variances of, os, oa? represent the spread of the measurement errors around their 


mean. For the leveling problem there is a very useful empirical rule: The variance is 
proportional to the distance between observation points. Thus we choose 


of = = afl l= ý x distance between sites 1 and 2 


o? = ofh = a, x distance between sites 2 and 3 


o? = ofl3 = of x distance between sites 3 and 1. 


The factor og is the variance of unit weight. In Chapter 9 (where variances and covariances 
are properly introduced), this factor og plays a valuable role. It allows us to rescale a 
covariance matrix, when our estimates of this matrix proves (from the actual data) to be 
unrealistic. In a statistically perfect world, our a priori covariance matrix would be correct 
and the scaling factor would be og =l 

Chapter 9 will also give a specific formula (9.64) to estimate og. 

We still need to postulate one height: x3 = H. Our three measurements still have 
errors r1, r2, r3 (and we know their statistics: mean zero and variances ogl 1s ogh, of 13): 


xı — x2 = b12 Ski 
x2 =b2 +H -r (8.10) 
—x] = bz -H -r3. 


The best solution £1, £2 minimizes Be ai: The variances ogh, ogh, AE are in the 
denominators. When we take derivatives of D n with respect to xı and x, these 
numbers 06 211,0 ogh, o 21; will appear in the linear equations. Calculus gives two equations 
from the x; souveiie and the x2 derivative of the weighted sum of squares: 


e (xt — x2 — by2)/(ogl1) — (—x1 — b31 + H)/(oġl3) = 0 
Xj 

i (8.11) 
aa —(x1 — x2 — b12)/ (ofl) + (x2 — bz — H)/(o§l2) = 0. 


It is slightly inconvenient to have these fractions. Matrix notation will be better. The 
numbers 1 / o? = (ofli) will go into a weighting matrix C. Our example has a diagonal 
weighting matrix, because the three errors are assumed independent: 


CA C] 
C= (alo)! = C2 
CAD C3 
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The next section will do the algebra for a general weighting matrix C: 
When Ax = bis weighted by C, the normal equations become ATCA = A'Cb. 


This is exactly equation (8.11). We will use c; instead of 1/ (oli), but you will see the 
same coefficients from (8.11) in the following matrix A'C A: 


l 0 -l1 ý = cp+c3 —c 

A'CA= C5 o tisa 1}. (8.12) 

—] 1 0 —C] C1 C2 
C3 -1 0 

This is symmetric and invertible and positive definite. With c} = c2 = c3 = 1 it is the 

unweighted matrix ATA computed above. Its determinant is A = c1c2 + c1c3 + c2¢3. The 

right side ATCb is also straightforward: 


C] bi2 
1 0 -1 c1b12 — c3b31 + c3 H 
= e 8.13 
E 1 d c2 oaen fee sebi oa (9:12) 
C3 b31 — H 


For the sake of curiosity we explicitly invert ATCA and solve A'CAX = ATCb: 
i sitata vi c1b12 — c3b31 + c3 H 
Ko} L a ey +03 | | cezbz — cibiz +H 


_ l c1€2(b12 + b23) — (c1c3 + €2€3)b31 | ra 
A | c103(—b12 — b31) + (c1c2 + c2¢3)b23 H 


Again all estimated heights move up or down with H. The unit weights c1 = c2 = c3 = 1 
bring back the unweighted ¥ computed earlier. These pencil and paper calculations are 
never to be repeated (in this book!). They show explicitly how the weights c; = 1/ (ali) 
enter into the estimates. The next section derives the key equation ATC A$ = A'CD. 


8.2 Weighted Least Squares 


The previous section gave an example of weighted least squares. The weighting matrix C 
was diagonal, because the observation errors were not correlated. The matrix becomes 
C = I (or really C = o*I) when the errors all have the same variance. This includes 
the i. i. d. case of independent and identically distributed errors. When errors are not 
independent, geodesy must deal with any symmetric positive definite matrix C = X7!. 
This is the inverse of the covariance matrix È. 

This section extends the normal equations ATC Ax = A!Cb and the basic theory to 
allow for C. That matrix changes the way we measure the errors r = b — Ax. The squared 
length rTr becomes r! Cr, including the weights: 


Ir? =r'r  changesto Iri? =r" Cr. 


When lengths change, so do inner products: a'b becomes a'Cb. Angles change too. Two 
vectors a and b are now perpendicular when a! Cb = 0. 
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In this sense, the best combination A£ of the columns of A is still a perpendicular 
projection of b. The fundamental equation of least squares still requires that the error 
r = b— Aŝ shall be perpendicular to all columns of A. The inner products a! Cr, between 
columns of A and the error r, are all zero. Those columns are the rows of A!, so 


Alcr=0 or A'C(b—AxX)=0 or ATCA = ATCb. (8.14) 


This is the weighted normal equation. 
Remember the other source of this equation, which is minimization. The vector £ is 


2 = 2 “hla: 
chosen to make Ef |; ziea ~ Ir ||@ as small as possible: 


Minimize ||r||2 = (b — Ax)'C(b — Ax). (8.15) 


Expand that expression into x'(A'CA)x — 2x'(ATCb) + b'Cb = x! Kx — 2x"'e + 
constant. Now we have a pure calculus problem. We can set derivatives to zero. That 
gives n equations for the n components of x. The equations will be linear because |jr ie iS 
quadratic. Matrix algebra will find those minimizing equations, once and for all. They are 
Kx = c with K = ATCA and c = A'CB: 


Theorem When K is symmetric positive definite, the quadratic Q(x) = x' Kx —2x'¢ is 
1 


minimized at the point where Kx = c. The minimum value of Q, at that point ¥ = K~‘c, 
is O(¥) = —c! Ke. 
Proof Compare Q(x) with all other Q(x), to show that Q(X) is smallest: 


Q(x) — QO) =x!’ Kx —2x'c —£' K€ + 2%' 
=x! Kx —2x'K£+x2'KX (substitute KX for c) 
= (x —£)' K(x —X). 


Since K is positive definite, this difference is never negative. Q(X) is the smallest possible 
value. At that point ê = K~'c, the minimum of Q is 


Qmin = (K~!c)'K(K7!c) — UK tece = —c' K~'e. (8.16) 


Corollary The minimum of the weighted error ||r le is attained when (ATC A) = ATCb. 
The minimum value is 


Ir = minimum of x"Kx —2x'c+b'Cb 
= —c!K!c +b"Cb 
= —b'CA(A'CA)!A'Ch+5'Cb. 


If A was a square invertible matrix, this whole error would reduce to zero! The 
solution ê = K~!e would be exact. The inverse of K = ATCA could be split into 
A~'c-!(A™)—!, and everything cancels. But this splitting is not legal for a rectangular 
matrix A. Our only assumption is that A has independent columns, which makes K posi- 
tive definite (and invertible). The product ATC A is invertible but not its separate factors. 
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column space of A 
= all vectors Ax 


Figure 8.1 The projection is perpendicular in the C-inner product: a'Cr = 0 for every 
column a of A. Then A'Cr = 0 is the weighted normal equation. The C-right triangle 
has |[r||2 + AIZ = ||B||2, which is bTCb. 


Figure 8.1 shows the projection geometrically. Our eyes use the eye-inner product. 
(C = I is C = eye in MATLAB.) So visually r does not look perpendicular to its projec- 
tion. But the angle is right in the C-inner product, which gives the key equation A'Cr = 0. 
This is ATCA = A'CD. 


Remark 8.4 On notation. Since the time of Gauss the method of least squares has been 
named adjustment theory in geodesy and other applied sciences. The traditional notation 
defines the residual r by 


Ax =b+r. (8.17) 


In agreement with statistics and numerical linear algebra we define the residual with oppo- 
site sign: r = b — Ax. Gauss used the notation P for the weights (Latin: ‘pondus’). For 
various reasons we have chosen to change this notation to C. 


8.3 Leveling Networks and Graphs 


For a closer look at C we need the basic ideas of statistics (variance and covariance). That 
discussion will come in Chapter 9. Here we take a much closer look at the rectangular 
matrix A. This has a special and important form for networks, and the leveling problem 
becomes a basic example in applied mathematics. The matrix A, whose entries are all 1’s 
and —1’s and 0’s, is the incidence matrix for a graph. 

A graph consists of nodes and edges. There are n nodes (the points where the 
heights x; are to be determined). There is an edge between node i and node j if we 
measure the height difference x; — x;. This makes m edges, with m > n. We will show 
how the equations that govern a leveling network fall into the general framework of applied 
mathematics and also into the special pattern involving the incidence matrix of a graph. 

Figure 8.2 shows a graph with four nodes and six edges. This is a directed graph, 
because edge directions are assigned by the arrows. The height difference along edge 1 
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1 
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i 
-1 0 
l 
0 
Figure 8.2 A graph and its 6 by 4 edge-node incidence matrix. 


is dj = x2 — xı. Our actual measurement of this difference is bj2. The arrow does not 
imply that node 2 is higher than node 1, it just specifies the difference as x2 — x1, rather 
than xı — x2. (The directions of the arrows are arbitrary, but fixed.) Now we introduce the 
incidence matrix or difference matrix or connection matrix of the graph. 

The incidence matrix A has a row for every edge and a column for every node. In 
our example the matrix is 6 by 4. Each row has two nonzero entries, +1 and —1, to show 
which node the arrow enters and which node it leaves. Thus the nonzero entries in row 1 
(for edge 1) are +1 in column 2, and —1 in column 1: 


—-1 1 0 < edge 1 
-1 0 0 
0 —1 0 
—1 0 l 
0 —1 l 
0 0 1 
This matrix contains all information about the graph. This particular example is a “com- 
plete graph,” with no edges missing. (A complete graph has all m = n(n + 1) edges. If 
an edge is removed from the graph, a row is removed from the matrix.) We could allow, 
but we don’t, double edges between nodes and an edge from a node to itself. A double 
edge would just mean that the height difference x; — x; was measured twice. 
The incidence matrix is more than a passive record of the edge connections in the 
graph. The matrix A is also active; it computes differences. When we apply A to a vector 
x = (x1, X2, X3, x4) of heights, the output Ax is a set of six height differences: 


—] 1 0 O x2 — X] 

—] 0 1 0 x} X3 — X1 

= 0 -1 1 0 XQ} | x3 — x2 
Ax = 1001 Pa ade ere (8.18) 

0 -l1 0 1 x4 x4 — X? 

0 0 -!l 1 X4 — X3 
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This is important. These differences are measured by b12, b13,..., b34. These measure- 
ments involve errors. The six equations to be solved by weighted least squares are 


x2 — x1 = bi? 
or in matrix form Ax = b. (8.19) 
x4 — x3 = b34 


The six equations in four unknowns are probably inconsistent (because of the measurement 
errors r1,..., re). We do not expect an exact solution; there are more equations than 
unknowns. We form the (weighted) normal equations to arrive at the best estimate £. But 
there is an issue to be dealt with first: 


The four columns of the matrix add to the zero column. Since A has linearly 
dependent columns, the matrices ATA and ATCA will not be invertible. Ac- 
tion must be taken. One or more heights must be fixed! 


In linear algebra, this question is about the “column space” of the matrix. Assuming that 
the graph is connected (it doesn’t separate into two parts with no edges between them) there 
is only one relation between the columns: they add to the zero column. The nullspace is 
one-dimensional, containing the vector (1, 1, 1, 1). The rank is n — 1. If we remove any 
one column, the new matrix A has full rank and the new ATA is invertible. If we fix one 
height (like x3 = H in the previous section), all other heights can be estimated. 

We are free to fix k heights, not just one. Then k columns of A are removed. The 
resulting matrix has full rank n — k. The normal equations will yield the n — k unknown 
heights. Those will be the best (weighted) estimates x from the m observations. 


Remark 8.5 We plan to insert a special Section 8.4 about the incidence matrix of a graph. 
The dimensions of the four subspaces will give a linear algebra proof of Euler’s famous 
formula, which is the grandfather of topology: 


n — m +1 = (# of nodes) — (# of edges) + (# of loops) = 1. (8.20) 


Figure 8.2 has / = 3 independent loops. Euler’s alternating sum is 4 — 6 + 3 = 1. This 
formula will be seen as equivalent to the fundamental dimension theorem of linear algebra: 


(dimension of column space) + (dimension of nullspace) = n. (8.21) 


Summary 


Leveling networks are described by directed graphs and incidence matrices A. The graph 
has a node for each height x;; there are k fixed heights and n — k unknown heights. There 
are m measurements b;; of height differences. Those correspond to the edges of the graph, 
each with a direction arrow. The graph changes to a network when we assign numbers 
Cj,...,Cm to the edges. 

Each number c; is the weight of an observation. Statistically c; is 1/ a, the reciprocal 
of the variance when we measure a height difference. For the leveling problem c; = 
1/ (a6 li) is proportional to the inverse length of that edge. These numbers go into the 
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diagonal matrix C which is m by m. It reflects characteristics about the edges while the 
incidence matrix A describes the connections in the network. In leveling networks the 
vector x denotes heights, and Ax denotes differences of heights. 

All nodes are assigned a height £1, ..., £n. The difference of heights along a loop is 
the sum (£2 — %1) + (x3 — %2) + (£1 — £3) = 0 in which everything cancels. 


Loop law: Components of Ax add to zero around every loop. 


This is the equivalent in geodesy of Kirchhoff’s Voltage Law for circuits. 

For edge i the weighted error y; equals c; times the residual r; = (b— AX);. Together 
for all edges this is a vector equation: y = Cr. The weighted normal equation is A! y = 0 
or A'C(b — AX) = 0. This is the equivalent of Kirchhoff’s Current Law at each node: 


Node law: ACF =0 ateach node. 


The node law secures what in statistics is called unbiasedness. We get unbiased esti- 
mates x; for the unknown heights. The expected (average) value of £; is correct. 

In statistical applications we are furnished with observational equations of the type 
Ax = b — r, where b is the observed difference of height and r a residual vector. The 
norm of r is to be minimized: E = ||r||? is r'r or more generally r'Cr. The major part 
of a height difference (Ax); is determined by the observed height difference b; while the 
remaining part is given by the residual r; = b; — (Ax);. Hence y = Cr evaluates to 


y = Cb — C Ax or Cly + Ax =b. 


This expression links the heights x with the weighted errors y. We no longer try to solve 
Ax = b which has more equations than unknowns. There is a new term C~'y, the 
weighted errors. The special case in which Ax = b has a solution is also the one in 
which all errors are zero. 


We summarize the basic equations for equilibrium: 
Cly +Ax =b 

22 

Aly = 0. Cee) 


This system is linear and symmetric. The unknowns are the weighted errors y and the 
heights x. We may write it in block form as 


e J-i i 


We may even use elimination on this block matrix. The pivot is C~', the factor multiplying 
row 1 is ATC, and addition eliminates A! below the pivot. The result is 


Cc A y b 
0 area H 7 KA i 
The equation to be solved for x is in the bottom row: 
ATC Ax = A'Cb. (8.25) 


Equation (8.25) is the normal equation. The error vector is normal to the columns of A (in 
the inner product determined by C). We get this equation by substituting y = C(b — Ax) 
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Heights x1,..., Xn 


Node law Aly = 0 


incidence matrix A transpose of A 


Weighted errors y = Cr 


Figure 8.3 Description of a leveling network. 


Height differences Ax 
Actual measurements b 


weight matrix C 


residual r = b — Ax 


into Al y = 0. The weighted errors y are eliminated and this yields equation (8.25) for the 
heights x. 

It is important that at least one height x; is given beforehand. When x; = H, the jth 
height is fixed and the jth column in the original incidence matrix is removed. We elim- 
inate the nullspace by this maneuver. The resulting matrix is what we finally understand 
by A; it is m by n — 1 and its columns are linearly independent. The square matrix ATC A 
which is the key for the solution of equation (8.25) for x is an invertible matrix of order 
n — 1 and with full rank 


Al C A = ACA . 
(n—1)bym mbym mby(n-1) (n—1) by (n—1) 


In practical leveling networks the height is fixed at several nodes. Let their number 
be k and consider the following procedure: 


- Enumerate all nodes from 1 to n, whether the node has a fixed height or not. 
- Write down the m by n incidence matrix A and the m by m weight matrix C. 
- Delete the k columns in A belonging to fixed nodes. 


- Bring those k columns, multiplied by the k fixed heights, to the right side and include 
them into b. The m measurement equations Ax = b in the n — k unknown heights 
have b = bmeasurea — (k columns) (k fixed heights). 


— Calculate ATC A and ATCb and solve the system ATC A£ = ATCb. 


This procedure is turned into the M-file lev. 
Example 8.1 For the leveling network depicted in Figure 8.4 as a directed graph, the 
following points have fixed heights: 
Ha = 10.021 m 
Hpg = 10.321 m 
Hç = 11.002 m. 
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Figure 8.4 Directed graph for a geometric leveling network. 


The observations of height differences and the lengths of the leveling lines /; are 


bı = 1.978 m lı = 1.02 km 
b = 0.732 m lh = 0.97 km 
b3 = 0.988 m l3 = 1.11 km 
b4 = 0.420 m l4 = 1.07 km 
bs = 1.258 m l5 = 0.89 km. 


The edge-node incidence matrix is 5 by 5. According to the given rules it looks like 


-1 O0 O 1 0 
-1 O0 O 0 1 
0 0 -1 1 0 
0 -1 0 O 1 
0 0 0 1 —1 
i—i M 
Á] A2 


This is the incidence matrix in case all points are free, that is, no height is fixed a priori. 
But on the contrary we want to keep the height of nodes A, B, and C fixed at given values. 
This means that the three first columns of A must be deleted and the right side is modified 
accordingly. The modified observation equations Ax = b — r are 


1 0 1.978 + 10.021 ri 
0 iliy 0.732 + 10.021 r2 
1 0 ie | = 0.988 + 11.002 | — | r3 
0 1 á 0.420 + 10.321 r4 
1 —1 1.258 rs 
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The weighted matrix for the two unknowns (at D and E) is with of =. 


0.980 0 0 0 0 1 0 
0 1.031 0 0 0 0O 1 
aca =] y : A E 0 0 0.901 0 0 1 © 
0 0 0 0.935 0 0O 1 
0 0 0 0 1.124 1 —1 
= | 3.0050 —1.1240 
= |—1.1240 3.0900 |` 
And accordingly the right side is 
0.980 0 0 0 0 11.999 
0 1.031 0 0 0 10.753 
Atco =| i Ei 0 0 0.901 0 0 11.990 
0 0 0 0.935 0 10.741 
0 0 0 0 1.124 1.258 
__ | 23.9760 
= {19.7152 |` 
Now the normal equations are A'CAX = ATCb: 
3.0050 —1.1240 || Hp | _ | 23.9760 
—1.1240 3.0900|]| He} {19.7152 |` 
The solution is 
Hp =11.9976m; = Ag = 10.7445 m; 
1 0 11.9976 
; 0 1 11.9976 10.744 5 
p=Ax= 1 0 10.7445 = | 11.9976 |; 
0 1 10.7445 
1 —1 1.253 1 
0.001 4 
0.008 5 
r = b — p= | —0.007 6 
—0.003 5 
0.0049 


8.4 Graphs and Incidence Matrices 


Any time you have a connected system, with each part depending on other parts, you have 
a matrix. Linear algebra deals with interacting systems, provided the laws that govern them 
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are linear. One special model appears so often, and has become so basic and useful, that 
we always put it first. The model consists of nodes connected by edges. It appears in the 
leveling problem, and now we want to go beyond that problem—to see the linear algebra 
of graphs. 

This section is entirely optional. It relates the leveling problem to the problem of 
voltages and currents in an electrical network. The incidence matrix A appears in Kirch- 
hoff’s Current Law, the weights c; = 1/ (oli) become conductances (= 1/resistances), 
and the height observations 5; ; turn into batteries! 

A graph of the usual kind displays a function like f(x). Graphs of a different kind 
(m edges connecting n nodes) lead to matrices. This section is about the incidence matrix A 
of a graph—and especially about the four subspaces that come with it. 

For any m by n matrix there are two subspaces in R” and two in R”. They are 
the column spaces and nullspaces of A and A!. Their dimensions are related by the most 
important theorem in linear algebra. The second part of that theorem is the orthogonality of 
the subspaces. Our goal is to show how examples from graphs illuminate the Fundamental 
Theorem of Linear Algebra. 

We review the four subspaces (for any matrix). Then we construct a directed graph 
and its incidence matrix. The dimensions will be easy to discover. But we want the 
subspaces themselves—this is where orthogonality helps. It is essential to connect the 
subspaces to the graph they come from. By specializing to incidence matrices, the laws 
of linear algebra become Kirchhoff’s laws. Please don’t be put off by the words “current” 
and “potential” and “Kirchhoff.” These rectangular matrices are the best. 

Every entry of an incidence matrix is 0 or 1 or —1. This continues to hold during 
elimination. All pivots and multipliers are +1. Therefore both factors in A = LU also 
contain 0, 1, —1. Remarkably this persists for the nullspaces of A and A!. All four sub- 
spaces have basis vectors with these exceptionally simple components. The matrices are 
not concocted for a textbook, they come from a model that is absolutely essential in pure 
and applied mathematics. 


Review of the Four Subspaces 


Start with an m by n matrix. Its columns are vectors in R”. Their linear combinations 
produce the column space, a subspace of R”. Those combinations are exactly the matrix- 
vector products Ax. So if we regard A as a linear transformation (taking x to Ax), the 
column space is its range. Call this subspace R(A). 

The rows of A have n components. They are vectors in R” (or they would be, if they 
were column vectors). Their linear combinations produce the row space. To avoid any 
inconvenience with rows, we transpose the matrix. The row space becomes R(A!), the 
column space of AT. 

The central questions of linear algebra come from these two ways of looking at the 
Same numbers, by columns and by rows. 

The nullspace N(A) contains every x that satisfies Ax = 0—this is a subspace of R”. 
The “left” nullspace contains all solutions to At y = 0. Now y has m components, and 
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space 
of A 


nullspace 
of AT 


nullspace 


of A 
dim n—r 


Figure 8.5 The four subspaces with their dimensions and orthogonality. 


N(A') is a subspace of R”. Written as y! A = 07, we are combining rows of A to produce 
the zero row. The four subspaces are illustrated by Figure 8.5, which shows R” on one side 
and R” on the other. The link between them is A. 

The information in that figure is crucial. First come the dimensions, which obey the 
two central laws of linear algebra: 


dimR(A) =dimR(A!) and dimR(A) + dim N(A) =n. 


When the row space has dimension r, the nullspace has dimension n—r. Elimination leaves 
these two spaces unchanged, and it changes A into its echelon form U. The dimension 
count is easy for U. There are r rows and columns with pivots. There are n — r columns 
without pivots, and those lead to vectors in the nullspace. 

The pivot rows are a basis for the row space. The pivot columns are a basis for the 
column space of the echelon form U—we are seeing R(U) and not R(A). The columns 
are changed by elimination, but their dependence or independence is not changed. The 
following matrix A comes from a graph, and its echelon form is U: 


E ae 30 Tt 4 0 0 
1 0 1 0 0-1 1 0 
0-1 1 0 0 0 -1 1 
Nay o gy Gof BORO ee. eG AG 
0-101 0 0 0 0 
0 0-1 1 0 0 0 0 


The key is that Ax = 0 exactly when Ux = 0. Columns of A are dependent exactly 
when corresponding columns of U are dependent. The nullspace stays the same, and the 
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dimension of the column space is unchanged. In this example the nullspace is the line 
through x = (1, 1, 1, 1). The column spaces of A and U have dimension r = 3. 

Figure 8.5 shows more—the subspaces are orthogonal. The nullspace is perpen- 
dicular to the row space. This comes directly from the m equations Ax = 0. The first 
equation says that x is orthogonal to the first row of A (to produce the first zero). The last 
equation says that x is orthogonal to the last row of A (to produce the last zero). For A 
and U above, x = (1, 1, 1, 1) is perpendicular to all rows and thus to the whole row space. 

This review of the subspaces applies to any matrix A—only the example was special. 
Now we concentrate on that example. It is the incidence matrix for a particular graph, and 
we look to the graph for the meaning of each subspace. 


Directed Graphs and Incidence Matrices 


Figure 8.2 displays a graph. It has m = 6 edges and n = 4 nodes. The incidence matrix 
tells which nodes are connected by which edges. It also tells the directions of the arrows 
(this is a directed graph). The entries —1 and +1 in the first row of A give a record of the 
first edge. Row numbers are edge numbers, column numbers are node numbers. 

Edge 2 goes from node 1 to node 3, so a2} = —1 and a23 = +1. Each row shows 
which node the edge leaves (by —1), and which node it enters (by +1). You can write 
down A by looking at the graph. 

The graph in Figure 8.6 has the same four nodes but only three edges. Its incidence 
matrix is 3 by 4. The first graph is complete—every pair of nodes is connected by an edge. 
The second graph is a tree—the graph has no closed loops. Those graphs are the two 
extremes, with the maximum number of edges m = 5n(n — 1) and the minimum number 
m =n — 1. We are assuming that the graph is connected, and it makes no fundamental 
difference which way the arrows go. On each edge, flow with the arrow is “positive.” Flow 
in the opposite direction counts as negative. The flow might be a current or a signal or a 
force—or a measurement of the difference in height! 

The rows of B match the nonzero rows of U—the echelon form found earlier. Elim- 
ination reduces every graph to a tree. The first step subtracts row 1 from row 2, which 


© 
node 
OROROORO 
; -1 1 0 0 1 
p= [o i | 2 edge 
A 0 0-1 1 3 
@ © 


Figure 8.6 Tree with 4 nodes. 
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creates a copy of row 3. The second step creates a zero row—the loop is gone: 


-1 1 0 O -l 1 0 0 -1 1 0 90 
-1 0 1 0O > 0 -1 1 0 > 0 -1 1 0. 


0 -1 1 0 0 -1 1 O 0 0 0 O 


Those steps are typical. When two edges share a node, elimination produces the “shortcut 
edge” without that node. If we already have a copy of this shortcut edge, elimination 
removes it. When the dust clears we have a tree. If we renumber the edges, we may reach 
a different tree. The row space has many bases and the complete graph has n"~? = 4? = 16 


trees. 


An idea suggests itself: Rows are dependent when edges form a loop. Independent 


rows come from trees. 


The next step is to look at Ax, which is a vector of differences: 


5| 1 0 0 X2 — X1 
—1 0 1 0 X] s Be 
0 —]l 1 0 x? OS ae) 
Ax = = , .26 
eh. o o ills E (940) 
0 -l 0 1 X4 x4— X2 
0 0 -l l x4 — X3 


The unknowns x1, x2, x3, x4 represent potentials at the nodes. Then Ax gives the potential 
differences across the edges. It is these differences that cause flows. We now examine the 
meaning of each subspace. 


1 


The nullspace contains the solutions to Ax = 0. All six potential differences are 
zero. This means: All four potentials are equal. Every x in the nullspace is a constant 
vector (c, c, c, c). The nullspace of A is a line in R”—its dimension is n — r = 1. 


The second incidence matrix B has the same nullspace for the same reason. Another 
way to explain x = (1, 1, 1, 1): The columns add up to the zero column. 


We can raise or lower all potentials by the same amount c, without changing the 
differences. There is an “arbitrary constant” in the potentials. Compare this with the 
Same statement for functions. We can raise or lower F(x) by the same amount C, 
without changing its derivative. There is an arbitrary constant C in the integral. 


Calculus adds “+C” to indefinite integrals. Graph theory adds (c, c, c, c) to poten- 
tials. Linear algebra adds any vector x, in the nullspace to one particular solution of 
Ax =b. 


The “+C” disappears in calculus when the integral starts at a known point x = a. 
Similarly the nullspace disappears when we set x4 = 0. The unknown x4 is removed 
and so are the fourth columns of A and B. Electrical engineers would say that node 4 
has been “grounded.” 


The row space contains all combinations of the six rows. Its dimension is certainly 
not six. The equation r + (n — r) =n must be 3 + 1 = 4. The rank is r = 3, as we 
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also saw from elimination. Since each row of A adds to zero, this must be true for 
every vector v in the row space. 


The column space contains all combinations of the four columns. We expect three in- 
dependent columns, since there were three independent rows. The first three columns 
are independent (so are any three). But the four columns add to the zero vector, which 
says again that (1, 1, 1, 1) is in the nullspace. How can we tell if a particular vector b 
is in the column space, and Ax = b has an exact solution? 


Kirchhoff’s answer Ax is the vector of differences in equation (8.26). If we add 
differences around a closed loop in the graph, the cancellation leaves zero. Around 
the big triangle formed by edges 1, 3, —2 (the arrow goes backward on edge 2) the 
differences are 


(x2 — x1) + (x3 — x2) — (4X3 — x1) = O. 


This is the voltage law: The components of Ax add to zero around every loop. 
When b is in the column space, it equals Ax for some x. Therefore b obeys the law: 


bi +b3 — b2 = 0. 


By testing each loop, we decide whether b is in the column space. Ax = b can be 
solved exactly when the components of b satisfy all the same dependencies as the 
rows of A. Then elimination leads to 0 = 0, and Ax = b is consistent. 


The left nullspace contains the solutions to AT y = 0. Its dimension is m —r = 6—3: 


(8.27) 


| 
OOO 1S 


The true number of equations is r = 3 and not n = 4. Reason: The four equations 
above add to 0 = 0. The fourth equation follows automatically from the first three. 


The first equation says that —y; — y2 — y4 = 0. The net flow into node 1 is zero. The 
fourth equation says that y4 + ys + ye = 0. The net flow into node 4 is zero. The 
word “net” means “flow into the node minus flow out.” That is zero at every node, 
and for currents the law has a name: 


Kirchhoff’s Current Law: Flow in equals flow out at each node. 


This law deserves first place among the equations of applied mathematics. It ex- 
presses “conservation” and “continuity” and “balance.” Nothing is lost, nothing is 
gained. When currents or forces are in equilibrium, the equation to solve is AT y = 0. 
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Notice the beautiful fact that the matrix in this balance equation is the transpose of 
the incidence matrix A. 


What are the actual solutions to AT y = 0? The currents must balance themselves. 
The easiest way is to flow around a loop. If a unit of current goes around the 
big triangle (forward on edge 1, forward on 3, backward on 2), the current vector 
is y = (1,—1,1,0,0,0). This satisfies ATy = 0. Every loop current yields a 
solution y, because flow in equals flow out at every node. A smaller loop goes 
forward on edge 1, forward on 5, back on 4. Then y = (1,0, 0, —1, 1, 0) is also in 
the left nullspace. 


We expect three independent y’s, since 6 —3 = 3. The three small loops in the graph 
are independent. The big triangle seems to give a fourth y, but it is the sum of flows 
around the small loops. 


Summary The incidence matrix A comes from a connected graph with n nodes and m 
edges. Here are the four fundamental subspaces: 


1 
2 
3 
4 


The constant vectors (c, c,..., c) make up the nullspace of A. 
There are r = n — 1 independent rows, using edges from any tree. 
Voltage law: The components of Ax add to zero around every loop. 


Current law: Al y = 0 is solved by loop currents. N(AT) has dimension m — r. 
There are m — r = m — n + 1 independent loops in the graph. 


For a graph in a plane, the small loops are independent. Then linear algebra yields Euler’s 
formula: 


(number of nodes) — (number of edges) + (number of small loops) 


=n—-m+(m—n+I1)=1. 


A single triangle has (3 nodes) — (3 edges) + (1 loop). For the graph in our example, 
nodes — edges + loops becomes 4 — 6+3. Ona 10-node tree Euler’s count is 10 — 9 +0. 
All planar graphs lead to the same answer 1. 
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Networks and ATCA 


The current y along an edge is the product of two numbers. One number is the difference 
between the potentials x at the ends of the edge. This difference is Ax and it drives the 
flow. The other number is the “conductance” c—which measures how easily flow gets 
through. 

In physics and engineering, c is decided by the material. It is high for metal and 
low for plastics. For a superconductor, c is nearly infinite (for electrical current). If we 
consider elastic stretching, c might be low for metal and higher for plastics. In economics, 
c measures the capacity of an edge or its cost. 

To summarize, the graph is known from its “connectivity matrix” A. This tells the 
connections between nodes and edges. A network goes further, and assigns a conduc- 
tance c to each edge. These numbers c),..., Cm go into the “conductance matrix” C— 
which is diagonal. 

For a network of resistors, the conductance is c = 1/(resistance). In addition to 
Kirchhoff’s laws for the whole system of currents, we have Ohm’s law for each particular 
current. Ohm’s law connects the current y; on edge 1 to the potential difference x2 — x 
between the nodes: 


current along edge = (conductance) times (potential difference). 


Ohm’s law for all m currents is y = —C Ax. The vector Ax gives the potential differences, 
and C multiplies by the conductances. Combining Ohm’s law with Kirchhoff’s current law 
Aly = 0, we get A'CAx = 0. This is almost the central equation for network flows. The 
only thing wrong is the zero on the nght side! The network needs power from outside—a 
voltage source or a current source—to make something happen. 


Note about applied mathematics Every new application has its own form of Ohm’s law. 
For elastic structures y = C Ax is Hooke’s law. The stress y is (elasticity C) times (stretch- 
ing Ax). For heat conduction, Ax is a temperature gradient and C is the conductivity. For 
fluid flows Ax is a pressure gradient. There is a similar law for least-square regression in 
statistics—and this is at the heart of our book. C is the inverse of the covariance matrix Lp. 

The textbook Introduction to Applied Mathematics (Wellesley-Cambridge Press) is 
practically built on “ATC A.” This is the key to equilibrium. In geodesy, the measurement 
of lengths will also fit this pattern. That is a 2-D or 3-D problem, where heights are 1-D. 
In fact the matrix A for lengths appears in the earlier book as the matrix for structures 
(trusses). Applied mathematics is more organized than it looks. 


We end by an example with a current source. Kirchhoff’s law changes from AT y = 0 
to Aly = f, to balance the source f from outside. Flow into each node still equals flow 
out. Figure 8.7 shows the current source going into node 1. The source comes out at node 4 
to keep the balance (in = out). The problem is: Find the currents on the six edges. 


Example 8.2 All conductances are c = 1, so that C = J. A current y4 travels directly 
from node 1 to node 4. Other current goes the long way from node 1 to node 2 to node 4 
(this is yj; = ys). Current also goes from node 1 to node 3 to node 4 (this is y2 = y6). We 
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Figure 8.7 A network of conductors with a current source. 


can find the six currents by using special rules for symmetry, or we can do it right by using 
ATCA. Since C = J, this matrix is 


—1 1 0 0 
ai e 0-1 0 O7]-1 0 1 =O 
r 1 0—1 0-1 Of] 0—1 1 0 
CASA | a a o wila Ge G A 
0 0 01 1 1 0-1 0 1 
0 0-1 1 

Fel = 4 

tet g 

lee oat. 43. Sai 

at of 24 3 


That last matrix is not invertible! We cannot solve for all four potentials because (1, 1, 1, 1) 
is in the nullspace. One node has to be grounded. Setting x4 = 0 removes the fourth row 
and column, and this leaves a 3 by 3 invertible matrix. Now we solve ATC Ax = f for the 
unknown potentials x1, x2, x3. The right side f shows the source strength S into node 1: 


3 -1 -l x] S X] S/2 
—] 3 -1 x.)}=1]0 gives x2 | = | S/4 
—] —1 3 X3 0 X3 5/4 
From these potentials, Ohm’s law y = —C Ax yields the six currents. Remember C = T: 
yı -1 1 0 90 5/4 
y2 —1 1 Oj; f 5/2 5/4 
y3 | _ 0O —1 1 OF; S/4] | O 
y| l-1 0 0o 1ļl||s/4| |s/2 
y5 0 —1 0 1 0 5/4 
y6 0 O-1 1 5/4 
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Figure 8.8 Projection of b on A’s column space. 


Half the current goes directly on edge 4. That is y4 = S/2. The rest travels along other 
edges. No current crosses from node 2 to node 3. Symmetry indicated y3 = O and now the 
solution proves it. 


A computer system for circuit analysis forms the matrix ATCA. Then it solves 
ATC Ax = f. This example shows how large networks are studied. The same matrix A! A 
appears in least squares. Nature distributes the currents to minimize the heat loss, where 
statistics chooses X to minimize the error. 


The Dual Problem 


All good minimization problems have (hidden in the shadows behind them) a dual problem. 
This dual is a new problem that looks very different from the primal, but it uses the same 
inputs. And in a subtle way it leads to the same solution. The route to that solution is often 
through Lagrange multipliers, which are introduced to handle the constraints (also called 
conditions) in an optimization problem. 

We describe first the dual to the direct unweighted problem of minimizing ||b—Ax||?. 
This is the fundamental objective of least squares, and it is not constrained: every vector x 
is admitted to the competition. Geometrically, this primal problem projects b onto the 
subspace of all vectors Ax (the column space of A). The residual e = b — AX is normal to 
the subspace, and this produces the normal equations! 


Alte=0 or A!AX=A!'D. 


Figure 8.8 shows the triangle of vectors b = Ax (projection) + b — A£ (error). 

The dual problem simply projects the same vector b onto the perpendicular space. 
That subspace, perpendicular to the column space of A, is perfectly described by the Fun- 
damental Theorem of Linear Algebra in Section 4.1. It is the nullspace of A’. Its vectors 
will be called y. Thus the dual problem is to find the vector y in the nullspace of A‘ that 
is Closest to b: 


Dual: Minimize Ł||b— y|? subjectto Aty=0. 
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Figure 8.9 Projection of b onto column space of A and nullspace of AT. 


The constraint is A' y = 0. This represents n equations in the m unknowns yj, ..., Ym. 
We suppose as usual that A is m by n, with m > n, and that its n columns are independent. 
The rank is n (which makes A! A invertible). 

Figure 8.9 shows the geometric solution to the dual problem as well as the primal. 
The dual solution is exactly y = e. It is the other leg of the triangle. If e is computed first, 
it reveals the primal leg Ax as b — e. The “primal-dual” solution is the whole triangle, and 
it includes both answers x = X and y = e. 

How is the dual problem actually solved? We are minimizing but only over a sub- 
space of y’s. The constraint A! y = 0 (n linear equations) must be respected. It needs to be 
built into the function before we can use calculus and set derivatives to zero. Constraints 
are built in by introducing new unknowns x1, ..., x, called Lagrange multipliers—one for 
each constraint. It is the Lagrangian function L whose derivatives we actually set to zero: 


L(x, y) = 4||b— yl? +x (ATy). (8.28) 


This is a function of m + n variables, x’s and y’s. It is linear in the x’s. The derivatives 
0L/dx; just recover the n constraints: 


dL 
Ox; 


coefficient of x; in (8.28) 
= jth component of Aly. 


The n equations dL /dx; = 0 are exactly the constraints Aly = 0. The other m equations 
come from the derivatives with respect to y: 


L = 4(b1 — y1)? +++ + $ (bm — Ym)? + (Ax) y 
OL 
D: = yj — bj + (jth component of Ax). 
j 
Setting these derivatives to zero gives the vector equation y + Ax = b. Again we see 
our triangle. Note that the primal could be solved directly (no constraints and therefore 
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no Lagrange multipliers). The minimizing equation was ATA = A'D. In contrast, the 
dual has dragged in x as a set of Lagrange multipliers, and produced a larger system of 
equations: 


y+Ax=b 
T (8.29) 
A y=0. 

If the problem involves a covariance matrix &, it multiplies y in the first equation above— 
because the dual problem is minimizing (b — yx (b — y). 

Substituting y = b — Ax from the first equation into the second, we again find 
AT Ax = A'b. The dual problem leads to the same normal equation! Thus the Lagrange 
multipliers x = x are computed first. In an important sense, they give the derivative of 
the minimum value with respect to the constraints. This topic is further developed in the 
first author’s textbook Introduction to Applied Mathematics (Wellesley-Cambridge Press, 
1986) and in all serious texts on optimization. 


The Solution to AT y = 0 


There is an alternative to Lagrange multipliers. We can try to solve directly the constraint 
equations A! y = 0. These are n equations in m unknowns. So there will be m — n degrees 
of freedom in the solutions. Suppose we write B for a matrix whose l = m — n columns 
are a basis for this solution subspace. Then B is m by l. 

This approach has the advantage that the number of unknowns is reduced by n (using 
the n constraints) instead of increased by n (introducing Lagrange multipliers for those 
constraints). It has the disadvantage that we have to construct the basis matrix B. For 
the leveling problem, which is so closely related to the example of a network of resistors, 
we can give an explicit description of the solutions to A'y = 0. This solution space 
has / = m — n nice basis vectors. But even then, with the tempting possibility of fewer 
unknowns, the dual problem of “adjustment by conditions” is very seldom used! 


Comment The situation is similar in the finite element method. There the unknowns x 
and y are the displacements and stresses. The primal method (displacement method) is 
chosen 99.9% of the time. The dual method (stress method) is much less popular—because 
it requires a basis matrix B of solutions to the equilibrium equations A! y = 0. Notice that 
in the finite element method, the original problem is expressed by differential equations. 
Then approximation by piecewise polynomials reduces to a discrete (finite matrix) formu- 
lation. In leveling, the problem is discrete in the first place. 


Now we identify the matrix B for a network of resistors (the electrical circuit exam- 
ple) or of baselines (the leveling example). The columns of B are solutions to Aly = 0. 
This is Kirchhoff’s Current Law: flow in = flow out at each node. It is clearly satisfied by 
flow around a loop. A loop is a closed path of network edges (baselines). At each node 
along the loop, one edge enters and another edge leaves. A unit current y around this loop 
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is described by a vector whose entries are +1 or —1 or zero: 


+1 if the loop follows the arrow on edge i 
yj = { —I if the loop goes against the arrow on edge i 
0 if the loop does not include edge i. 


There are / independent loops in the network, and they give / independent columns of B. 
There remains the task of identifying / independent loops. For leveling this is easy; just 
take the small loops displayed in the network. Then large loops involving many baselines 
will be combinations of these small basic loops. Example 8.5 describes the general code 
findloop. 

The general solution to A! y = 0 is a combination of these loop vectors y in the 
columns of B. Their combinations are all vectors y = BA. The l = m — n unknowns are 
the coefficients à, ..., Ay and we now have an unconstrained minimization—since these 
vectors y = BÀ satisfy the constraints. Instead of minimizing jib — yll? with constraints 
Aly = 0, we solve the equivalent dual problem: 


Minimize 4||b — BAl|? over all vectors À. (8.30) 
The minimizing equation is the dual normal equation 
BTBìÀ = B'b. (8.31) 


If there is a covariance matrix £ in the primal problem (which led to A? X~! A), then this 
matrix appears in BTE B. The dual energy or complementary energy to be minimized 
is 5(b — BA)" E(b — Bh). The weights are © not X~!. The dual normal equations become 
B'S BA = B' Eb. 

Now we consider the specific meaning of the unknowns y1, ..., Ym (and the condi- 
tioned unknowns ij, ..., Az) for geodesy. 


Example 8.3 A dual formulation of the leveling problem: adjustment by conditions. The 
idea is to make the loop law (Kirchhoff’s Voltage Law) fundamental instead of the current 
law. The sum of height differences around loops is zero, and known heights contribute to c: 


B'(b +r)=c or B'r =w=c—B'b. (8.32) 


With w = 0 this is the loop law. In order to avoid confusion with the ATC A formulation 
we have written B which has dimensions m by l; here / denotes the number of loops and 
m the number of observations. Equation (8.32) is the condition equation. 

The least-squares condition still is r'Cr = min, but this is a constrained mini- 
mization. The components of the vector r of height differences are subject to the con- 
ditions B'r = w expressed by equation (8.32). We introduce the Lagrange multipliers 
à = (Aj, A2,---,A7), One multiplier for each constraint. Instead of minimizing r'Cr we 
work with the extended function L = ric r+A'(Blr — w). The m + l unknowns, 
namely r and À, are determined so that the m + / partial derivatives of L are zero: 


oL OL 
— =Cr+Bir=0 and = 


T 
TEREE n FEA = e 3 
Ir TN Br-w=0 (8.33) 
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Note that the derivative with respect to À brings back the original constraint. Solving the 
first equation (with C = E7!) gives 


r = —C7! BÀ = —EBÀ. (8.34) 
This is known as the correlate equation. Substitute into B'r = w to reach 
-BEBA =w (8.35) 


which are the dual normal equations. These l equations yield A = —(B'XB)~'w. From 
the correlate equation (8.34), the estimated residuals are 


f = DB(BTDB)'w. (8.36) 
With w = c — B'b we get the estimated observations b = AR: 
b=b—? = (I — EB(BTEB) !BT)b — SB(BTZB) !c. (8.37) 
One possible expression for the estimate of the weighted sum of squares of residuals is 
PTC? =A BEBA. 


For completeness we bring the expression for the estimate of the variance of unit weight. 
This quantity will be defined in Chapter 9: 


š ATBTEBÀ 


In most geodetic problems the number of redundant observations l = m — n is much 
smaller than m. Rough calculations reveal that //m = 0.1-0.2 in traverses, and l/m > 0.4 
in plane networks. This implies that the number / of normal equations for the dual least- 
squares formulation is much smaller than the number n for the standard formulation. This 
fact was important in earlier times when the solution of the normal equations was the great 
obstacle in a least-squares procedure. Today this is of less importance. It is far more 
difficult to program a computer to set up the conditions than just to read the observation 
equations. So today most software uses the standard formulation as described in the rest 
of this book. 


Example 8.4 We shall illustrate this dual theory by means of the data in Example 8.1. 

There are m = 5 observations, n = 2 unknowns, hence / = m — n = 3 conditions. 
Two of these conditions are of a special type as they involve points with fixed heights. Only 
the last condition (sum of height differences around a loop) is a typical one: 


w, = (Ha — Hc) + (bı — b3) 
w2 = (Ha — Hp) + (b2 — b4) 
w3 = —bı + b2 + bs. 
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The loops appear in the rows of BT, and w is given by the problem: 


1 0-1 0 0 0.009 
B'=| 0 1 O-1 0 and w= | 0.012 
-1 1 0 0 1 0.012 


With og = | the covariance matrix comes from the distances J; between points: 


1.02 
0.97 
y= co! — 1.11 
1.07 
0.89 
The solution of the dual normal equations (with minus sign) is 
—0.006 9 
à = | —0.003 3 
—0.005 5 
Then the residuals and estimated observations are given as 
0.001 4 1.9766 
0.008 5 0.7235 
r = | —0.0076 and b =b -F = | 0.9956 
—0.003 5 0.4235 
0.004 9 1.253 1 


The only ordinary loop condition is satisfied, too: 
=b + by + bs = 


Example 8.5 In that example it was easy to recognize the independent loops from the 
figure. For larger networks it is not always too simple to find a full set of loops. So we 
look for an automatic way of creating independent loops. 

Our starting point will be a list of edges. Each edge is described by the “from” node 
(which the edge leaves) and the “to” node (which it enters). The leveled height differences 
are known for all edges, and they should sum to zero around every loop. 

Each edge forms a row of the matrix A. We have just learned that the loops are the 
special solutions to the equation ATy = 0. So for given A we need a code to compute a 
complete set of special solutions—these are vectors with y; = +1 when an edge is in the 
loop and otherwise y; = 0. This is done by the collection of M-files: looplist, findloop, 
findnode, plu, and ref. (The file findloop is a renamed version of the Teaching Code called 
null described in Section 3.2.) The M-file ref computes the reduced echelon form of AT, 
and findloop puts the special solutions of AT y = 0 into the nullspace matrix B. 

The output is a list of edges for each loop, and the loop sums that should be zero. By 
the M-file findnode we produce the sequence of nodes in each loop. Recall that a network 
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has many complete sets of loops. Our particular set is determined by the order of the edges 
and nodes in A. Repeated edges make a special case and they create no difficulties. 

A modified version (we have three differences instead of one) of the same code can 
be used to combine GPS baselines to form loops of 3-dimensional vectors! The M-script 
vectors contains a set of GPS vectors, the M-file v_loops does the computation and the 
result can be read in vectors.log. 


On calculational accuracy The observed height differences b; are rounded to mm. It is 
natural in all subsequent calculations to keep this accuracy. To the normal equations and 
their solution is even added one extra digit (and also in the standard deviation of the two 
heights). 

The weight matrix is indicated with 3 significant digits. In well conditioned problems 
the weights are not so decisive for the result, see Section 11.6. Besides, the determination 
of weights is not always a well defined matter. 


The node law ATC? = 0 in Example 8.1 is fulfilled at the free nodes, but the fixed 
nodes cause a modification. The node law as derived in Section 8.3 works perfectly well for 
a network without fixed nodes. As point of departure we arrange the observation equations 
so that the free nodes are at the bottom of x. (The columns of the original incidence ma- 
trix A are interchanged accordingly.) Columns in A referring to fixed nodes are contained 
in A, and the fixed heights are denoted x°: 


Ax =[A, Ap] | Spr 


The unknown part x has 
Aox = (b — Ax?) — r. (8.39) 


As usual, we form the normal equations by multiplication to the left by ATC: 


i Rd 5 P 

C Ax = C(b — Aix) — Cr. 
T T T 

A> A> A, 


The lower equation is just the set of normal equations which we already solved. The upper 
equation is 
AiCAox = A}C(b — Aix?) — AICr. 
Rearranged this becomes 
AlCr = —AjCAox + AT C(b — Aix?) 
= APC(—Aox +b — A,x°). 
Remembering y = Cr, we get 


y= C (b =- A x )- Ay xW) (8.40) 
mbym mbyl mby(a—f) (n—f) byl mbyf fbyl 


The oriented sum of y’s along any path connecting two points with fixed heights equals 
the weight times the difference of observed height differences and fixed heights. 
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Problem Set 8.4 


edge 1 edge 2 


edge 3 A 3 4 


Write down the 3 by 3 incidence matrix A for the triangle graph. The first row has 
—1in column 1 and +1 in column 2. What vectors (x1, x2, x3) are in its nullspace? 
How do you know that (1, 0, 0) is not in its row space? 


Write down A! for the triangle graph. Find a vector y in its nullspace. The compo- 
nents of y are currents on the edges—how much current is going around the triangle? 


Eliminate xı and x2 from the third equation to find the echelon matrix U. What tree 
corresponds to the two nonzero rows of U? 


—x1 + x2 = b] 
—x1 + x3 = b2 
—x. + x3 = b3. 


Choose a vector (b1, b2, b3) for which Ax = b can be solved, and another vector b 
that allows no solution. How are those b’s related to y = (1, —1, 1)? 


Choose a vector (f1, f2, f3) for which A'y = f can be solved, and a vector f 
that allows no solution. How are those f’s related to x = (1,1, 1)? The equation 
Aly = f is Kirchhoff’s law. 


Multiply matrices to find ATA. Choose a vector f for which A'Ax = f can be 
solved, and solve for x. Put those potentials x and the currents y = —Ax and 
current sources f onto the triangle graph. Conductances are 1 because C = /. 


With conductances c} = 1 and c2 = c3 = 2, multiply matrices to find ATCA. For 
f = (1,0, —1) find a solution to A'CAx = f. Write the potentials x and currents 
y = —CAx on the triangle graph, when the current source f goes into node 1 and 
out from node 3. 


Write down the 5 by 4 incidence matrix A for the square graph with two loops. Find 
one solution to Ax = 0 and two solutions to Al y = 0. 
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Find two requirements on the b’s for the five differences x2 — x1, x3 — x1, x3 — X2, 
x4 — x2, X4 — x3 to equal by, b2, b3, b4, bs. You have found Kirchhoff’s law 
around the two in the graph. 


Reduce A to its echelon form U. The three nonzero rows give the incidence matrix 
for what graph? You found one tree in the square graph—find the other seven trees. 


Multiply matrices to find A‘ A and guess how its entries come from the graph: 


(a) The diagonal of ATA tells how many into each node. 
(b) The off-diagonals —1 or 0 tell which pairs of nodes are 


Why is each statement true about A! A? Answer for ATA not A. 


(a) Its nullspace contains (1, 1, 1, 1). Its rank is n — 1. 

(b) Itis positive semidefinite but not positive definite. 

(c) Its four eigenvalues are real and their signs are 

With conductances c} = c2 = 2 and c3 = c4 = c5 = 3, multiply the matrices 


ATCA. Find a solution to A'CAx = f = (1,0,0, —1). Write these potentials x 
and currents y = —C Ax on the nodes and edges of the square graph. 


The matrix ATCA is not invertible. What vectors x are in its nullspace? Why does 
A'CAx = f havea solution if and only if fi + fo + f3 + fa =0? 


A connected graph with 7 nodes and 7 edges has how many loops? 


For the graph with 4 nodes, 6 edges, and 3 loops, add a new node. If you connect it 
to one old node, Euler’s formula becomes ( )—( )+( ) = 1. If you connect it 
to two old nodes, Euler’s formula becomes ( )—( )+( )=1. 


Suppose A is a 12 by 9 incidence matrix from a connected (but unkrown) graph. 


(a) How many columns of A are independent? 
(b) What condition on f makes it possible to solve Aly=f? 
(c) The diagonal entries of ATA give the number of edges into each node. What 


is the sum of those diagonal entries? 


Why does a complete graph with n = 6 nodes have m = 15 edges? A tree connecting 
6 nodes has edges. 


8.5 One-Dimensional Distance Networks 


There are a few other linear least-squares problems in geodesy. The following section 
touches upon two of them. Chapter 13 will discuss other cases. 


306 8 Leveling Networks 


Example 8.6 The simplest linear situation in surveying is the following: Between 4 points 
A, B, C, and D situated on a straight line we have measured the distances AB, BC, CD, 
AC, AD, and BD. The six measurements are 


(bi, b2, ..., b6) = (3.17, 1.12, 2.25, 4.31, 6.51, 3.36). 


A similar problem concerning directions is the so-called station adjustment which is treated 
in Section 13.3. 

As unknowns we choose AB = x; and BC = x2 and CD = x3. We have m = 6 
and n = 3. The weights are set to unity so C = I. The observation equations then are 


x, =b,-7 
x2 = bo -r 
x3 = b3 -r3 


xı + x2 = b4 — r4 
xı +x2 + x3 = bs -rs 
x2 + x3 = be — r6. 


In matrix form this is 


1 0 0 3.17 ri 
0O 1 0 3 1.12 r2 
OO ajj | | 225] |x 
1 1 0 = ~ | 4.31 r4 
O E a 6.51 rs 
0O 1 1 3.36 r6 
The normal equations become 
3 2 1 x1 13.99 
2 4 24] | x2 | =] 15.30 
1 2 3) 1x3 12.12 
a 
ATA 
The solution is 
3.170 
x= k12 
2.235 


Example 8.7 Next we shall modify the situation described in Example 8.6 by introducing 
an extra unknown z, the zero constant. It is due to the fact that the zero mark of the 


A 3.17 B 1.12 C 225 D 
[_ 


aa a SI 


6.51 


<— Oe 
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measuring tape is possibly not situated exactly at the physical end point. So we want to 
determine z along with the earlier 3 unknowns. 
The new version of the observation equations looks like 


1 0 0 -1 3.17 rı 
0 1 0-1 X] 1.12 r2 
0 0 1 -l X3 J -12.23 EEE 
1 1 0 -1 x3 | | 4.31 r4 
l l 1 —1 Z 6.51 r5 
0 1 1 -1 3.36 r6 
The modified normal equations are 
3 2 1 —3 x] 13.99 
2 4 2 —4 x2} 15.30 
1 2 3 -3 x3] 12.12 
—3 -4 -3 6 2 —20.72 
The solution is 
3.1625 
x =] 1.1150 and z = —0.0150. 
2.227173 


Example 8.8 A yet more comprehensive model also allows for a change of scale k of the 
measuring tape. The modified observation equations are 


kxj+z=b,-r 

kx +z=b -r 

k x3 +z = b3 -r3 

kx +kx2: +z = b4- r4 

kx, +kx: +kx3 +z = 05 -rs 
kx +kx3 +z = bs — r6. 


This problem is not linear. Products k x;, k x2, k x3 of unknowns appear. How to solve this 
type of problem is the subject of Chapter 10. 


RANDOM VARIABLES AND 
COVARIANCE MATRICES 


9.1 The Normal Distribution and x? 


A great deal of practical geodesy consists in taking observations. The observed values are 
real numbers. But they are not exact; they include “random” errors. In order to analyze 
these values we take them to be continuous random variables. In distinction to a real 
variable, a random variable is furnished with a probability distribution. Often, the various 
values of X are not equally probable. 

The probability of hitting a particular real number x = n is zero. Instead we intro- 
duce a probability density p(x). The chance that a random X falls between a and b is 
found by integrating the density p(x): 


b 
Prob(a < X <b) = | p(x) dx. (9.1) 


Roughly speaking, p(x) dx is the chance of falling between x and x + dx. Certainly 
p(x) > 0. If a and b are the extreme limits —oo and ox, including all possible outcomes, 
the probability is necessarily one: 


(0.0) 


Prob(—oo < X < +00) = J p(x)dx = 1. (9.2) 


The most important density function is the normal distribution which is defined as 
m a 20" 
p(x) = a r (9.3) 


Equation (9.3) involves two parameters. They are the mean value u and the standard 
deviation o (the standard deviation measures the spread around the mean value). Often 
the two parameters are given the “standardized” values u = 0 and ø = 1. Any normal 
distribution (9.3) can be standardized by the substitution y = (x — yw)/o. Then p(x) 
becomes symmetric around x = 0 and the variance is o? = 1: 


p(x) = ee? (9.4) 
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0.98 a 
0.84 F(x) = f-., p(x) dx 
p(x) 
0.50 
0.16 
0.02 
—2 —l 0 l 2 -2 —l 0 1 2 


u—2o -0o H pu+o0o u+2o u—20o ~p-o H u+o u+2o 


Figure 9.1 The normal distribution p(x) (bell-shaped) and its cumulative density F (x): 
p(x) is given in (9.3)—(9.4) and there is no explicit formula for its integral F (x). 


The factor ./27 is included to make [p(x) dx = 1. 
The bell-shaped graph of p(x) in Figure 9.1 is symmetric around the middle point 
x = u. The width of the graph is governed by the second parameter o—which stretches 
the x axis and shrinks the y axis (leaving total area equal to 1). The axes are labeled to 
show the standard case u = 0, o = 1 and also the graph of p(x) for any other u and o. 
We now give a name to the integral of p(x). The limits will be —oo and x, so the 
integral F (x) measures the probability that a random sample is below x: 


x 


Prob(X < x) = | p(x) dx = cumulative density function F (x). (9.5) 
(©. @) 


F(x) accumulates the probabilities given by p(x), so dF (x)/dx = p(x). The reverse of 
the integral is the derivative. The total probability is F(co) = 1. This integral from — oo 
to co covers all outcomes. 

Figure 9.1b shows the integral of the bell-shaped normal distribution. The middle 
point x = u has F = . By symmetry, there is a 50-50 chance of an outcome below the 
mean. The cumulative density F(x) is near 0.16 at u — o and near 0.84 at u +o. The 
chance of falling in the interval [u — o, u + o] is 0.84 — 0.16 = 0.68. Thus 68% of the 
outcomes are less than one deviation o from the center u. 

Moving out to u — 1.960 and u + 1.960, 95% of the area is in between. With 
95% confidence X is less than two deviations from the mean. Only one outcome in 20 is 
further out (less than one in 40 on each side). This 95% confidence interval is often taken 
as an indicator. When an observation is outside this interval, when it is more than 20 away 


Table 9.1 Cumulative density function F(x) for the normal distribution 


e 0.0 0.5 1.0 1.5 2.0 Z5 3.0 
0.5 0.6915 0.8413 0.9332 0.9773 0.9938 0.9987 
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Table 9.2 One-dimensional normal distribution: Confidence intervals 


Ix -u| <o 0.50 |x — p| < 0.6740 


Ix — u| < 20 0.95 | |x — u| < 1.9600 
|x — u| < 30 0.99 Ix — u| < 2.5760 


from u, we may accept that this happened with probability below 5% or we may question 
whether the estimated values for o and u are correct. 

Geodesy frequently sets the standard error further out, at u + 30. The cumulative 
density at that point is F = 0.9987. Thus the probability of a (one dimensional) sample 
further than three standard deviations from the mean (above or below) is 2(1 — F) = 
0.0027. Tables 9.1 and 9.2 show cumulative probabilities at important breakpoints. 

The integral of p(x) from a to b is the probability that a random sample X will lie in 
this interval: 


Prob(a < X < b) = Prob(X < b) — Prob(X < a) = F(b) — F(a). 
For a random variable X which is normally distributed with u = 1 and ø = 2 we have 
Prob(2 < X < 3) = Prob(2 < X < 3) = F(35") — F(4*) 
= F(1) — F (4) = 0.8413 — 0.6915 = 0.1498. 


If x1, ..., Xn are independent random variables each with mean u and variance aĝ, then 
x = )_ x;/n is still normally distributed with mean u. The variance is og fn. 


Two-dimensional Normal Distribution 


In two dimensions we have probability densities p(x, y). Again the integral over all possi- 
bilities, —co < x < co and —œ < y < œ, is 1. The normal distribution is always of the 
greatest importance, and we write the standardized form first. The mean is u = (0, 0) and 
the 2 by 2 covariance matrix is © = I: 


{21.2 
Dey) =5,6 °F, (9.6) 


The M-file twod generated the 100 points (x, y) in Figure 9.2 in accordance with this 
two-dimensional distribution. (You can obtain your own different set of 100 points.) The 
figure shows the circle of radius 20 = 2 around the mean. In the two-dimensional normal 
distribution, only 86.47% of the sample points are expected to be in this circle. This is 
because ff Gay p(x, y)dx dy = .8647. We count 16 points outside the circle in this 
particular sample. 

Table 9.6 will show the corresponding numbers for the circles of radius ø and 3a. 
The squared distance x? + y? is governed by a x? distribution, which has the simple form 
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Figure 9.2 The two-dimensional normal distribution and the circle of radius 20. 


p2(s) = ses /2 The table also shows the (non-integer) multiples of ø that will give 90%, 


95%, and 99% confidence intervals around the mean u (assuming we know u). 
We get the marginal distribution of y, when we integrate over all x: 


M(y) = | p(x, ydx. (0.7) 


—oOoO 


In this instance M(y) is simply the one-dimensional normal distribution for y. That is 
because p(x, y) in (9.6) separates into a product p(x)p(y). The variables x and y are 
uncorrelated; the matrix covariance & is diagonal. But certainly we meet many situations 
in which correlation is present and & is not diagonal. This highly important idea is defined 
and developed in Section 9.3. We give here the normal distribution when the covariance 
matrix & has diagonal entries o? and o? and off-diagonal entry X12 = X21 = Oxy: 


glx yet y]"/2 (9.8) 


l 
x, y) = — 
PUY) = ria 


If the mean is moved from (0, 0) to (ux, wy), then replace x and y by x — ux and y — py. 


x? Distribution 


Next we investigate the probability distribution for a random variable defined as the sum 
of squares of the Gaussian random variables x1, ..., Xn: 


Xn = XT +g bo +. 
We assume that the x; are independent and normally distributed with u = 0 and o = 1. 


The random variable x, has a geometrical interpretation: x, is the distance from the origin 
to a point with coordinates (x1,..., Xn). 
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In Example 9.1 we derive the density function for x = roe but already now we want 
to present the general expression for every n: 


(n/2)—-1 67/2 x>0. (9.9) 


Pn(x) = PRD 


For small values of n this x? distribution is strongly unsymmetric. Its mean value by (9.22) 
iSu =n: 


1 ii 
Ziea aa oe O —x/2_(n/2)—1 
E{x;}= PP (n/D | xe x dx 


a /2 „n/2 
Lap op 
T PRT) | ete x"? dx 
= Aram + n/2) 2 H2) 
(n/DI(n/2) q1+(n/2) 7 o 
~ MaD (9.10) 
Similarly we calculate the variance as 2n: 
2¢y2) = l S ehed- dy — 2 
o“ (Xa) = Para h E dx — (mean) 
u 0 
EAE P 2+(n/2) — p2 
A +n/D(n/D)En/2) 2+» 2 
p 22 (n/2) 


For n — œ, the distribution of x2 /n tends to the normal distribution with mean 1 and 
variance o? = 2/n. Figure 9.3 shows p,(x) for n = 1, 2, 3, 4, and 10. 


Example 9.1 Derivation of equation (9.9). Start with the distribution of ie 


Prob(x < x; <x +dx) =Prob(/x <x, < vx +dx) + Prob(—Vx + dx <x; < —/x) 
= 2 Prob(./x <x) < Vx + dx). 


We expand ~x + dx into a Taylor series ./x + dx/2./x +--+ to obtain 
2 Prob(./x < x1 < Vx +dx) © 2Prob(./x < xı < yx + R) 


PO oe */2 dx _ 1 el dy. 


This informal argument agrees with (9.9) since T (4) = v/T: 


pı(x) = age. for x > 0. (9.12) 
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Figure 9.3 Probability density pp ( x7) with n = 1, 2, 3, 4, 10 degrees of freedom. 


Alternatively one could start from the cumulative distribution for ee 


Prob(x? < x) = Prob(—/x < x1 < vx) = 2F(./x) — 1. 


Then formally pı (x) is the derivative: 


pı (x) = L aF )=1)= gz 
We intend to prove by induction the general expression 


1 
z (n—2)/2 „,—x/2 
Pn(x) ITG m e 


We start by assuming that equation (9.13) is valid for the indices 1, 2, ..., 


for x > 0. 


(9.13) 


n and want to 


prove it for n + 1. According to the definition, x? = se + --- + x? with independent 
terms. Consequently x? +1 = x? + ie Again 5 and x? are independent. Now we need 
the convolution theorem for probability densities: The probability density function of the 
sum of two independent random variables is the convolution of their probability density 


functions: 


A X 
Pn+1 (x) = | Pn(y) pi(x — y)dy = cn | ye) 


x 4,(n/2)-1 
se | aa 
0 ae: 


1 
= ener? | Oa AG) 


o a a)? 


—(x—y)/2 
P R 


AED 


dy 


(9.14) 
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We determine the constant C,, so that the total probability is 1: 


©.@) (0,8) 
| Pn+1(x)dx = Cn | xDe dy = 1. 
0 0 
Replacing x /2 by t, the integral becomes the gamma function: 
Cy 20D (1) = 1, 
Substituting for C;,,, equation (9.14) becomes 


1 
t _ _(n-1)/2,~x/2 
Praia) = DAT e i (9.15) 


Compared with (9.13), all indices have been incremented by exactly one. Therefore (9.13) 
is valid for all positive integers n. 


In many applications the cumulative function K, (x) is more useful than p, (x) itself: 
X 
Kn (x) = Prob(x? < x) = | Dnt) dt. (9.16) 
0 


Ky,(x) gives the probability that the square sum of n normally distributed random vari- 
ables x; is smaller than a given number x. This function is tabulated for various values 
of n. Without the factor 1 /(2”/ T (n/2)), Kn(x) is called the incomplete gamma function. 

Often one encounters distribution functions closely related to x?. For example if 
A= y? +--+ y2 with E{y;} = 0 and E{y?} = ø? then X = ory. We will also meet 


random variables of the form 
2 | | 2 
= i (y ie Ya) 


2X 
n 
VX = Jyit-:-+y? 
aX = Vali t-- +). 
All these variables have density distributions related to p, (x). We gather them in Table 9.3. 


Statistical test theory assumes that we can present two clear-cut alternative hypotheses. In 
geodesy this is seldom possible. So instead the concept of confidence regions is much 
more used, and that brings the x distribution into play. 

In particular, the random variable (n — 1)? Jog using the sample variance 67 = 
ay (xi — 2)? follows a x? distribution with n — 1 degrees of freedom. 

Remember K,,(x) is a probability. When this probability is fixed at some specific 
value K,(x) = P% the corresponding value for x is known as the P%-fractile. For the x? 
distribution with n degrees of freedom the P%-fractile is represented by the symbol X P’ 
Values of Xe p are given in Table 9.4. 

By means of the x? distribution and the estimated variance G* we can derive confi- 
dence intervals for the (theoretical) variance GA. Estimates are usually denoted by a hat (°). 


2 
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Table 9.3 Random variables and their probability densities 


Random variable Probability density when y; are N (0, o? 


eee, l (n/2)—1 „—y/(20?) 
aaa Za h aa e 
o =) 2"/2T (n/2)o” z 


(n/2)"/? (n/2)—1 ,—ny/(Q20) 
r (n/2)o” 


n-1,-y?/Q20?) 


MEn" > 


Let the P%-fractile of the x? distribution with n — 1 degrees of freedom be called omen p° 
Then 
(n — 1)? 


Prob (x2-1.0 < a < 0) = P-Q. 
0 


We solve this equation for og and obtain the (P — Q)%-confidence interval for og: 
n—-l1., n—l. 
ô? < og < ô? 
2 2 
Xn—1,P Xn-1,0 


(9.17) 


If we have u unknowns instead of 1, the number n — 1 changes ton — u. 
For the two-sided confidence interval Q > 0 and P < 100, we put Q = 100 — P. 
Then equation (9.17) becomes 
am 26) Se (9.18) 
Xn—u,P Xn—u,(100—P) 


If the confidence is selected as 90% then often this is split into two equal halves and one 
calculates the probability to get a lower and an upper limit. 

Suppose an empirical standard deviation ó is determined on the basis of f degrees 
of freedom. We define a confidence interval (of confidence 1 — œ) for oo through the 
probability conditions 


Prob(a < 09 <b)=1-—a and Prob(oo < a) = Prob(o9 > b) =a /2 


and we seek the limits a and b. Assume the original observations are normally distributed, 
so that Xf = fê? lof and 


Prob(x7 < Xfq/2) =@/2 and Prob(x? < x71_(@2)) = 1 — (@/2). 
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Table 9.4 Probabilities for one-sided P%-fractiles of the x? distribution 


5% 10% 50% 90% 95% 99% 


0.00016 0.0039 2J 3.8 
0.020 0.103 4.6 6.0 
0.115 0.352 6.3 7.8 
0.30 0.71 ; 7.8 9.5 
0.55 1.15 ; 9.2 11.1 
0.87 1.64 ; 10.6 12.6 


1.65 2.73 . 13.4 15.5 
2.56 3.94 ; 16.0 18.3 
8.3 10.9 28.4 31.4 
15.0 18.5 40.3 43.8 
22:2 26.5 51.8 55.8 
29: 34.8 63.2 67.5 
70.1 77.9 118.5 124.3 


Those combine into 


Prob(a < oo <b)=1-a@ 


— f A pay A =o f A Peo A 
a= | = yaf and b= 5 0 = ypð. 
Xf,1—(a/2) Xf,a/2 


The factors yg and yp obviously depend on the confidence probability 1—a@ and the number 
of degrees of freedom. The factors are shown in Table 9.5 for 1 — a = 0.95. This uses 
Table 9.4 which can be found in Abramowitz & Stegun (1972), Table 26.8. For f = 5 we 
get 0.6726 < oo < 2.096. 


with 


A common geodetic least-squares problem is the estimation of coordinates of control points 
for a network. Simultaneously we get a covariance matrix for the estimates. This leads 
to the construction of the confidence ellipse, see Section 9.8. We now ask what is the 


Table 9.5 Limiting factors ya and yp for confidence interval with probability 0.95 


0.446 0.521 0.567 0.599 0.624 0.699 0.765 0.837 0.879 
31.9 6.28 3.73 2.87 245 1.76 1.44 1.24 1.16 
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Table 9.6 x? withn = 2: Probability K2(c*) that x? + y? < œ? 


0.90 | 2.146 


0.95 | 2.448 
0.99 | 3.035 


probability that the estimated point will lie within this confidence ellipse. K2 (c?) tells the 
probability for the following inequality: 


2 2 
Ài À2 


The integral of p2(x) = ae 2 = se */ 2 gives the cumulative probability K3: 


2 


c 
K2(c°) = Prob( x? < °) = 4 i egy = 1 — e% (9.19) 
0 


We indicate some values in Table 9.6. 


F Distribution 


The ratio of sums of squares is often encountered in geodetic practice. Let x1,..., Xm 
and y1,..., Yn be normally distributed independent random variables. We define « as the 
dimensionless quantity 


m 
1 2 
et X- 
Á a ' s/m 
c= a = —, (9.20) 
1 2 t/n 
n & Yj 
j=l 


We want the probability Fmn (x) that this fraction «x is smaller than a given constant x. 
The x? probability densities for the individual sums s and t are 


l xm-2)/2o—*/2 
2m/2P(m/2) 


The combined probability density follows from convolution 


= — i (2-2) /2,—x/2 
per and OMGIS zara © 


l „0-2/2 (2-2) /2g—(x4y)/2. 


2(m+n)/20(m/2)T (n/2) 


After some calculations we get the final formula for this F distribution: 


Pmn (K) = y 


Fmn (x) = (9.21) 


n 


T ((m +n)/2) — git a 
P(m/2)T (n/2) (1 p (m/n)x) "tr? 
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9.2 Mean, Variance, and Standard Deviation 


We now find the mean u of any distribution p(x). The symmetry of the normal distribution 
guarantees that the built-in number u is the mean. In general u is the “expected value.” To 
find p, multiply outcome x by probability p(x) and integrate: 
OO 
mean = u = E{X} = | xp(x)dx. (9.22) 
—OO 
This is usually different from the point x = m where F(m) = Z. That point is the median, 
since there are equal (50%) probabilities of x < m and x > m. 
Together with the mean we introduce the variance. It is always written o? and in 
the normal distribution it measures the width of the curve. The variance a” is the expected 
value of (X — mean)? = (X — u)?. Multiply outcome times probability and integrate: 


(0,0) OO 

o? = | (x — w? p(x) dx = | x? p(x) dx — we. (9.23) 
— 00O —OO 

The standard deviation (written o) is the square root of a”. 

In practice we may repeat an observation many times. Usually, economy will dictate 
some limit. Each independent observation produces an outcome X. The average of the 
outcomes from N observations is X (called “X hat’): 

a Ce Ce Cre om 

A= DAN = average outcome. (9.24) 
Frequently all we know about the random variable X is its mean u and variance o?. Itis 
amazing how much information this gives about the average X: 


Law of Averages: X is almost sure to approach u as N — oo. 


Central Limit Theorem: The sum of a large number of independent identically distributed 
random variables with finite means and variances, standardized to have mean 0 and 
variance 1, is approximately normally distributed. 


No matter what the probability for X, the probabilities for X move toward the normal bell- 
shaped curve. The standard deviation of the average is close to o / JN. In the Law of 
Averages, “almost sure” means that the chance of X not approaching u is zero. 

The quantity o/u is referred to as the relative accuracy or the variation coefficient. 
The inverse quantity u/o is the signal-to-noise ratio. 

Finally we derive some useful computational rules for means and variances. Let h 
be a real function. Then Y = h(X) is a random variable and we have 


foe) 


F(Y} = ERO = | h(x) p(x) dx. (9.25) 


If a and b are real numbers and h(x) = ax + b this yields 


E{aX +b} =aE{X} +b. (9.26) 
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Evaluating (9.25) for h(x) = (x — E{X p) = (x — u}? gives 


CO 
o7{X} =f (x — E{X})’ p(x) dx. (9.27) 
—0o 
When we introduce a linear substitution aX + b, we get 
(0,0) 
o? {aX +b} = | (ax +b — E{aX + by)’ p(x) dx ` 
—0oo 


j 2 
=} (ax +b — aE{X} — b) p(x) dx 
es 2 
=a? f (x — E{X}) p(x) dx = a*a{X}. 
The shift by b leaves ø? unchanged. The scaling by a multiplies that variance by a”: 


o7{aX +b} = a?o? {X}. (9.28) 


Notice that in the formulation of mean value and standard deviation, the order of 
observations is of no consequence. Since none of the observations are time-dependent, the 
time at which an element is observed within the sequence cannot affect the outcome. 

A time-ordered sequence of random variables is called a discrete random process. 


9.3 Covariance 


Until now we have studied only independent random variables. Next we make the step 
to two random variables X; and X2, possibly correlated. Consider the probability that X; 
falls between x; and x; -++dx, and in the same observation X> falls between x2 and x2 +dx2. 
This produces the joint probability density p(x1, x2) dx;dx2. Again its integral is 1: 


OO OO 
Prob(—oco < X1, X2 < +00) = : J P(x}, x2)dxidx2 = 1. (9.29) 
—W vy- 
The mean value jz; of X1, and the mean value of u2 of X2, are 


OO OO 
Hi = | | xı p(x1, x2) dx, dx2 
—OO Yy -0 


OO (© 6) 
| J x2 p(x1, x2) dxıdx2. 
—oo J — 00 


The mean value of a function ø (x1, x2) is 


H2 


OO OO 
E{g(X1, X2)} = | J (x1, x2)p(x1, x2) dxıdx2. (9.30) 
—00 Yy -0 
For the important case g(X1, X2) = X1 + X2 we get 


E{Xı + X2} = | | (x1 + x2) p(X1, x2)dxıdx2 = E{X1} + E{X2}. (9.31) 
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The covariance between X; and X> is defined as 


CO CO 

012 = cov{ X1, X2} = | | (xı — 1) (x2 — 2) P(X1, X2) dx dx2 (9.32) 
—-OO vy -0 

= E{X1X2} — wip. (9.33) 


In case X2 = X, we get the variance A of the random variable xı. The covariance of a 
variable with itself is its variance. Thus by convention 01, is written o?. 

Let X; denote the weight and X3 the height of a number of individuals. An observed 
value of X2 tends to be small if the corresponding value of X; is small and conversely. The 
random variables X; and X2, weight and height, are said to be dependent. They do not 
satisfy the following requirement: Two random variables with probability densities pj (x1) 


and p2(x2) and joint probability density p(x1, x2) are independent if 
P(x1, X2) = p1(%1) p2(x2) for all x; and x2. (9.34) 


Independence is an exceedingly important property. In general we do not know p(x1, x2) 
from the separate probabilities. 

Finally we generalize to n random variables. We observe X = (X1, X2,..., Xn). 
The vector of means is p = (u1, 2,.--, Un) Where u; = E{X;}. The most important 
quantity in geodetic statistics is the expectation of the matrix (X — w)(X — p)! which 
contains all the products (X; — ;)(X; — uj). The expectations of all these products enter 
the covariance matrix Ly: 


Ex = E{(X — w)(X — w)"} = E(XX")} — pp’. (9.35) 


This covariance matrix contains every oj; = E{(X; — wi)(X; — uj)} and is symmetric: 


O; 912 Oln 
2 
021 0 O2n 
Ex=| °° 7 ia (9.36) 
2 
Onl 9n2 On 


Covariance is a measure of the stochastic dependence between two parameters X; and Xj. 
Clearly o;; measures a coupling between the errors of X; and the errors of X;. One can 
speak about stochastic dependence or independence. If X; and X; have a tendency to 
deviate either both positively or both negatively, then o;; will be positive. This does not 
imply that a positive X; — u; cannot occur together with a negative X; — j;—but it is 
less likely. Similarly o;; will be negative if a positive X; — ui prefers to be coupled to 
a negative X; — uj. It is important to note the difference between two random variables 
being independent and being uncorrelated. They are independent if and only if p(x1, x2) = 
P1(x1)p2(x2). They are uncorrelated if o;; = 0. Two independent random variables X; 
and X; are uncorrelated. 
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The covariance matrix Ly is positive definite (or at least semidefinite). For proof, 
note that any linear combination / = c,(X1 — u1) +--+: +cn(Xn — Hn) has 


C1 


OLE => > agoe lc sted Cn |Zx 7 1. (9.37) 


Cn 


Thus c! Dyce is nonnegative for any vector c, which makes Ly semidefinite. 
The correlation coefficient pij is a standardized covariance, never exceeding 1: 


Pij = oij | 070} = dij/oio; and |pij| <1. (9.38) 


Example 9.2 Covariance matrices must be nonnegative definite. In practice roundoff er- 
rors and other error sources may cause the matrix to become indefinite. One or more 
small negative eigenvalues can appear. Whenever one encounters such a matrix it must be 
repaired in some way. 

A good procedure is to build up a modified matrix using only the positive eigenvalues 
and their eigenvectors. The following MATLAB code will do the job by discarding any 
Ai < 0: 


function R = repair(S) M-file: repair 
% Repair of indefinite covariance matrix 
[v, lambda] = eig(S); 
[n,n] = size(S); 
R = zeros(n,n); 
fori = 1:n 

if lambda(i,i) > O 

s = lambdaii,i) * v(;,i) * v(:,1); 


end; 
R=R + s; 
end; 
9.4 Inverse Covariances as Weights 
Suppose we observe the position of a satellite, many times. Those observations b1, .. . , Dm 


have errors and the errors may not be independent. We want to know (for example) the 
velocity of the satellite. The question in this section is how to use the a priori information 
we have about the accuracy and the degree of independence of the observations. That 
information is contained in the covariance matrix Xp. 

As usual in least squares, we are fitting many observations (positions at different 
times) by a few parameters (initial position and velocity). A perfect fit is not expected. 
We minimize the lack-of-fit expressed by the residual r = b — Ax. When we choose the 
vector £ that makes r as small as possible, we are allowed to define what “small” means. A 
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very basic choice of the quantity to minimize is the weighted length r' Cr, and the problem 
is to choose appropriately the weights in C. 

We want to prove that choosing the weight matrix C inversely proportional to the 
covariance matrix Ya», leads to the best linear unbiased estimate (BLUE) of the unknown 
parameters. The estimate is X and we will find its covariance. 

First a few comments about the choice of C. All we know about the errors, except for 
their mean E{r} = E{b — Ax} = 0, is contained in the matrix £p. This is the covariance 
matrix E{bb"} or entrywise E {b;b j} of the observation errors. What we require of any rule 
xX = Lb, which estimates the true but unknown parameters x from the measurements b, is 
that it be linear and unbiased. It is certainly linear if L is a matrix. It is unbiased if the 
expected error x — x—the error in the estimate, not in the observations—is also zero: 


E{x — ê} = E{x — Lb} = E{x — LAx — Lr} = E{(I — LA)x} = 0. (9.39) 


Thus L is unbiased if it is a left inverse of the rectangular matrix A: LA = I.* Under this 
restriction, Gauss picked out the best L (we call it Lo) and the best C in the following way: 
The matrix C should be og times the inverse of the covariance matrix Xp. 

His rule leads to Lo = (AE A ATS. which does satisfy LopA = I. For 
completeness we include a proof that this choice is optimal. 


The best linear unbiased estimate (BLUE) is the one with C = of 2, 1. The covariance 
matrix Z, is as small as possible. The optimal estimate x and the optimal matrix Lo are 


$ = (ATE 'A) IATE; b = Lob. (9.40) 


This choice minimizes the expected error in the estimate, measured by the covariance 
matrix E, = E{(x — ĉ)(x — £)!}. Formula (9.42) shows this smallest (best) matrix Ex. 


Proof The matrix to be minimized is 
Ex = E{(x — ĉ)(x — ĉ)'} = E{(x — Lb)(x — Lb)"} 
= E{(x — LAx — Lr)(x — LAx — Lr)'}. 
Since x = LAx and L is linear, this is just 
E, = E{(Lr)(Lr)"} = LE{rr L] = LEL". 


Thus it is LEL! that we minimize, subject to LA = I. To show that Lo is the optimal 
choice, write any L as Lo + (L — Lo) and substitute into L, = LEL": 


Ex = LoEpLo + (L — Lo)EbLo + LoXy(L — Lo)" + (L — Lo) ElL — Lo)’. (9.41) 
The middle terms are transposes of one another, and they are zero: 


(L — Lo) ZpL 4 = (L — Lo) Eb Ez 'A (ATE; A) = 0 


*LA = I just means: Whenever b can be fitted exactly by some x (so that Ax = b) our choice £ should be x: 
£ = Lb = LAx = x. If the data lies on a straight line, then the best fit should be that line. 


324 9 Random Variables and Covariance Matrices 


because Lp XL, l is the identity and (L — Lo)A = I — I = O. Furthermore the last term 
in (9.41) is symmetric and at least positive semidefinite. It is smallest when L = Lo, which 
is therefore the minimizing choice. The proof is complete. 


The matrix £, for this optimal estimate x comes out neatly when we simplify 
E, = LoLplg = (A'E, A) ATE; E_h,A(ATE, TA). 
Cancelling Xp» and ATE p l A with their inverses gives the key formula for Xy: 
When x is computed from the rule ATE 5 lay = ATS 5 1h its covariance matrix is 
Ex = (ATE, 'A) (9.42) 


This Ly gives the expected error in X just as Xp gave the expected error in b. These ma- 
trices average over all experiments—they do not depend on the particular measurement b 
that led to the particular estimate . We emphasize that E, = E{(x — £)(x — £)"} is 
the fundamental matrix in filtering theory. Its inverse ATE p lA is the information matrix, 
and it is exactly the triple product ATC A/ oR. It measures the information content of the 
experiment. It goes up as the variance Xp goes down, since C = og Ly 1 It also goes up 
as the experiment continues; every new row in A makes it larger. 

In the most important case, with independent errors and unit variances and C = I, 
we are back to “white noise” and ordinary least squares. Then the information matrix is 
ATA. Its inverse is £x. 


Remark 9.1 We can always obtain C = I from C = og Up by a change of variables. 
Factor the matrix oĉ £; | into WTW and introduce F = Wr. (Often W is denoted 09%, '/”, 
the matrix square root of &, .) These standardized errors r = W (Ax — b) still have mean 
zero, and their covariance matrix is the identity: 


E{(Wr)(Wr)'} = WE{rr'}w! = WE WT = 1. 


The weighting matrix W returns us to white noise—a unit covariance problem, with sim- 
pler theory and simpler computations. t 


Remark 9.2 If one of the variances is zero, say o? = Q, then the first measurement is 
exact. The first row and column of Xp are zero, and Ep is not positive definite or even in- 
vertible. (The weighting matrix has (X, Na = oo.) This just means that the first equation 
in Ax = b should be given infinite weight and be solved exactly. If this were true of all m 
measurements we would have to solve all m equations Ax = b, without the help of least 
squares; but with exact measurements that would be possible. 


To repeat we have the following relation between the covariance matrix and weight matrix: 
2 

CXp = al. (9.43) 

*Throughout mathematics there is this choice between a change of variable or a change in the definition of 


length. We introduce X = Wx or we measure the original x by ix || = x ly. Itis really Hobson’s choice; 
it makes no difference. 
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If the observations are uncorrelated this is simply 


co? = C207 == Cn Oe = |] Op. (9.44) 
Later we shall examine the quantity og more closely. For the moment just regard it as a 
“variance with unit weight.” 


For experienced readers Of course, the weights or the variances must be decided before 
any least-squares estimation. In practice this happens in the following way. If we want 
observation X2 with variance os to have weight c2 = 1, then the weight cı is given as 
os / of ; 

Most of the time one starts with uncorrelated observations. (A typical correlated 
observation is the difference of original observations. This is particularly the case with 
GPS observations or angles derived from theodolite direction observations.) Uncorrelated 
observations imply that all covariances ø;; (i # j) are zero. Then the covariance matrix of 
the observations b is diagonal: 


Xp = diag(o?,...,07) = ; (9.45) 


It is suitable to work with weights deviating as little as possible from 1. In geodesy 
we often introduce unit weight for 


- a distance observed once and of length 1 km 
- a direction as observed in one set 


- a geometric or trigonometric leveling observed once and of length 1 km. 


Example 9.3 Let m observations of the same variable x be given as b1, ..., bm. We want 
to prove that the weighted mean x and its variance are given as 
. e'Cb 3 1 Ficr 
x= and 2 = — —, 
e'Ce * —(m—1) e'Ce 
Here we introduced f = b — AX and the vector e = (1,1,..., 1). The m observation 


equations x; = b; — r; can be written in the matrix form Ax = b — r: 


bı ri 
bz r2 
i= — 
1 bin Fm 


The normal equations ATC Ax = ATCb are 


C1 l C] bı 


[1...1] Seo a gar E] 
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This is $} ;-4 ciX = > j_, cibi and the solution is 


m 
t=.) G n | Za =etco/etce 
i=l 
According to Table 9.7 (on page 336) we have F' Cr = b'Cb — b'C a which yields 


SE pa N cib? -êX cibi = X cib? +4($ X ci sayan) 


or 
PTCP = DE: shy. 


Then (9.60) and (9.64) yield the variance for the weighted mean value <x: 


Example 9.4 Finally we make a preliminary inspection of the concept of correlation and 
its eventual influence. Suppose we have observed a given distance twice with the results 
bı = 100m and bp = 102m. The least-squares estimate for the length is x. If C = J then 
£ = 101m is the average. In case C is diagonal we get b} < x < b2. When cı > œ 
we approach x = bı and when c2 — œ we obtain £ = b2. The least-squares result for 
diagonal C (no correlation) always lies between the smallest and the largest observation. 
If the two observations are correlated the circumstances change drastically. We still 
have AT =[1 1] for the two observations. Suppose the inverse covariance matrix is 


is T 

2 1 {2 5|) ~ 
Now ATCA = 2 and A!Cb = 206. Therefore x = 103 > bp. A strong positive correlation 
combined with a large weight for b2 results in x > b2 which cannot happen for independent 


observations. In Section 11.6 we shall show how arbitrary covariance matrices may lead 
to arbitrary least-squares results. You may run the M-file corrdemo. 


9.5 Estimation of Mean and Variance 


In practice we often want to estimate unknown quantities on the basis of one or more series 
of observations. Let X1, X2,..., Xn be a set of independent random variables and let 


be defined for all given sets of data—and for any value of n. If E{Y} > n and o?{Y} > 0 
for n — œ we Say that (9.46) is an unbiased estimate of the constant n. 
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Note especially that the sample mean X is an unbiased estimate of u: 
E{X}=u and = o7{X}=—. (9.47) 


The average X is an unbiased estimate of the mean iL. 
Next we look for an estimate of oĉ. Start by assuming that jz is known. We consider 
the function 


a2 2 (Xi — w) m2 
O 


n 


(9.48) 


According to the definition of variance we have E{(X; — u)?} = 07; hence E{ĝ?} = 0%. 
Generally the mean y is not known. In this case we start with the identity 


X;-m=(X;-X¥)+(%-p) 
and get 
Y(X: — w)? = (Xj — X} + 0(¥ — p)’. 


The double products vanish because ` (X; — X) = 0. We take the mean value on both 
sides and get 


= E{S(xX; — X)*} +0? 


Rearrange this into 


E[S (Xi — £} = (n— No”. (9.49) 
This says that 
Xi — X)? : 
$ = PE Cet 2 has  E{S7} = 07. (9.50) 
n—| 
Since o {X} = a/f /n, we also get an estimate of the standard deviation of the average: 
se s DATAA x) 
X —— E 9.51 
Cy N rey (9.51) 


These estimates both are biased due to the following fact: Let y be an unbiased estimate 
of 7 and g a nonlinear function of y. Then the estimate (y) for (n) is likely to be biased 
because generally E{y(y)} Æ g(E{y}) for nonlinear g. 


Example 9.5 We want to prove that $ is a biased estimate for ø. 
Let a random variable X have a Gamma distribution G with shape parameter œ and 
scale parameter $: X ~ G(a, B). Then E{X} = a/B ando*(X) = a/f* and 


re ) pe 


p(x) = e*/B for x > 0. 
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The expected value of the square root of X is 


oO a—l1 
— 1/2 —x/B 
E{/Xx} | x Tia) B@ e dx 


1 CO 
= J AIAR dy 
r (œ) Bo 


_ Tat 5) 1/2 
sao (9.52) 


For $? as defined in (9.50) we have 52 ~ G(s, 497). By means of (9.52) we have 


and from (9.50) we have y E{S2} = o. Now we postulate that E{5} < y E{S*} or 


jn-1 


For n = 3, 4, 5, 6 the left side is ./x /2, 2| 7, 3/7 /4, and 8/34/71. These values are 
certainly smaller than ~y (n — 1)/2. Thus § is a biased estimate of o. Consequently, G(X ) 
in (9.51) is biased, too. 


By introducing matrix notation we may rewrite some earlier results. Let c be the 


vector (c1, C2,..., Cn) and e = (1, 1,..., 1). Then the mean is 
T T 
Unweighted X=“ andweighted Ẹ =——. (9.53) 
n ce 
The weighted version of (9.48) involves the diagonal matrix C = diag(c1,..., Cn): 


X — ue)!C(X — XTCX Ty 
wo (A He) OS He). 9 baa +e. (9.54) 
cle cle cT 


9.6 Propagation of Means and Covariances 


In practice we are often more interested in linear functions of the random variables than 
the random variables themselves. The most often encountered examples are covered by 
the linear transformation 


Y = BX +k, 


where B is a known m by n matrix and k (also known) is m by 1. The linearity of the 
expectation operator E yields 


E{Y} = E{BX +k} = BE{X}+k. (9.55) 
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Then the covariance for Y comes from the covariance for X. This formula is highly impor- 
tant, and we will use it often! 


Law of Covariance Propagation If Y = BX + k and k is known then 


Sy = BYyB!. (9.56) 


The proof is direct. The shift by k cancels itself, and the matrix B comes outside by 
linearity: 
Ey = E{(Y — E{Y) Y — E{¥})"} 
= E{(BX +k—BE{X}—k(BX +k — BE{X}—k)'} 
= E{(BX — BE{X})(BX — BE{X})'} 
= BE{(X — E{X})(X — E{X})"}B! = B Ey B!. 


In geodesy, the law Ey = BXxB! appears in many applications. Sometimes B gives 
a change of coordinates (in this case B is square). For least squares, X contains the n 
observations and Y contains the m fitted parameters (in this case B is rectangular, with 
m > n). We will frequently quote this law of covariance propagation. In the special 
case that all X; are independent, Xy is diagonal. Furthermore, if B is 1 by n and hence 
Y = b,X, +---+),Xn, one obtains the law of error propagation: 


oy = biot +--+» + bio. (9.57) 


This gives the variance for a combination of n independent measurements. 
Note that if the vector k is also subject to error, independently of X and with mean 
zero and covariance matrix X;, then this matrix would be added to Ly. The law becomes 


Ly = BY y BI + yy. 


This law is at the heart of the Kalman filter in Chapter 17. 


Example 9.6 In leveling it is common practice to attribute the variance og to a leveling 
observation of length /. According to equation (9.57), in leveling lines of lengths 27, ..., nl 
we have variances 206, TEN no. Consequently the weights are 1/2,..., 1/n. For a lev- 
eling line observed under these conditions the weight is reciprocal to the distance. 


Example 9.7 We have two independent random variables x and y with zero mean and 
variance 1. Then the linear expression z = ax + by + c has a normal distribution with 
mean c and variance a? + b?: E, = [a b][1 [5] =a’? + b?. 


Example 9.8 A basic example to surveying is the coordinate transformation from polar to 
Cartesian coordinates: 


x =r cos and y =r sinð. 
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xp — 0 x; = 0:75 X= 1.25 
with oo = 0.3 with o} = 0.1 with o2 = 0.2 


5 R, R2 


Figure 9.4 Pseudoranges from the satellite S to the receivers R; and R2. 


From given values of (r, 0) the coordinates (x, y) are found immediately. For given vari- 
ances (o+, 09), the propagation law yields o, and oy by linearization. The matrix B be- 
comes the Jacobian matrix J containing the partial derivatives: 


J= dx/dr dx/d@|  |cosð -rsin 
= |3y/ðr 3y/ð0| | sind rcosð |’ 


The law of propagation of variances yields the covariance matrix X, for x = (x, y): 


5 J o? 0 A a? cos? 0 +r7a 4 sin? 6 (a? —r oon 
n 0 og 7 (a? -r 202) sin cos 0 o? sin? 0 +r?o 2 cos 29 


Insertion of the actual values yields the covariance matrix Ly. The x and y coordinates 
are uncorrelated when 0 = 0 or 0 = 2/4. Then we have a polar point determination 
in the ae of one of the coordinate axes. Also oyy = 0 when the special condition 
o? = = r? of prevails. Then the standard deviation o, of the distance measurement equals 
the perpendicular error rog. This leads to circular confidence ellipses. 


Example 9.9 (Single differences of pseudoranges) One of the simplest and best illustra- 
tions of the covariance propagation law is the process of taking differences. The difference 
of two receiver positions is the baseline vector between them. So we are solving for this dif- 
ference (x2 — x1 ) rather than the individual coordinates x; and x2. Of course the difference 
operator is linear, given by a simple 1 by 2 matrix [—1 1]. The hope is that the difference 
is more stable than the separate positions. The covariance matrix should be smaller. 

This example is in one dimension. The satellite and the two receivers in Figure 9.4 
are collinear, which never happens in practice. A later example (Section 14.5) will have 
two satellites and two receivers and work with double differences. 

Physically, taking the difference cancels the sources of error that are common to 
both receivers (the satellite clock error and almost all of the ionospheric delay). This is 
the basis for Differential GPS. There are many applications of DGPS, coming in Part III 
of this book. If one of the receiver positions is well established (the home base), then an 
accurate baseline gives an accurate position for the second receiver (the rover). Differential 
GPS allows us to undo the dithering of the satellite clock. Our goal here is to see how the 
difference (the baseline) can be more accurate than either endpoint—and to calculate its 
covariance by the propagation law. 

We can use the MATLAB command randn(1,100) to produce 100 explicit sample 
points. Then the covariance matrices can be illustrated by Figure 9.5, and by the very sim- 
ple M-file oned. We owe the example and also its M-file to Mike Bevis. First come the 
formulas. 
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No error at origin With error at origin 


Observed l 
Observed L 


Observed / i l Observed / i 


Figure 9.5 The one-dimensional normal distribution function and the 2ø -curve of con- 
fidence. 


The positions of the satellite and the receivers are xọ = 0, xı = 0.75, x2 = 1.25. 
They are subject to independent random errors, normally distributed with mean zero and 
standard deviations o9 = 0.3, o1 = 0.1, o2 = 0.2. We chose oo largest to make the figures 
clear. The measurement of x; (its pseudorange) has errors from the satellite and that first 
receiver. A vector of n samples comes from 


pı = 0.75 * ones(1,n) +0, * randn(1,n) — oo * randn(1, n). 


A similar command gives n samples of the pseudorange p2. But since randn produces new 
numbers every time, we must define del = oo x randn(1, n) and subtract that same satel- 
lite error from both p; and p2. This is the whole point(!), that the difference eliminates the 
satellite error. The M-file achieves this by o9 = 0, or we can work directly with p2 — p1. 

Now come the covariance matrices. We use the important fact that the mean (the 
average) of n samples has variance (and covariance) reduced by 1/n. Thus from the data 
we compute the scalars 


mı = mean(pı) and m = mean(p2) 


V = cov(pı, p2) and Vm = V/n = covariance of the mean. 


The first plot in Figure 9.5 shows the n sample values of (p1, p2), centered about the mean 
(mı, m2). The small inner ellipse is controlled by Vm and the outer ellipse by V. On 
average 86% of the sample points should lie in that outer ellipse with nsigma set to 2. 

The key point is that pı is correlated with o2. Therefore the ellipses are tilted. The 
same satellite error enters both pı and p2, with the same sign. The correlation is positive. 
The errors are large because of 0 = 0.3. The theoretical covariance matrix & and the 
sample covariance matrix V are 


2 
O; on 
z=] ; 7 and Vij = (pi — mi)" (pj — mj). 
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Now take differences d = p2 — pı. The linear transformation is T = [—1 1]. The 
propagation law TETT says that the covariance (or just variance) of the scalar d should be 
—] 
+1 
Notice how the effect of oo = 0.3 has disappeared from Xg. For the sample value of the 
covariance of d we compute around the sample mean: 


covariance = Xg = | -1 +1]z | | = 0? — 2012 +08. 


n 
mq = mean(d) and Va = cov(d) = X (di) — mda)’. 
l 
The MATLAB output compares the predicted £g with the actual Vj. 
One more plot is of great interest. It shows the measured distances dı and də to the 
receivers, without the satellite error that they share: 


dı = 0.75 x ones(1, n) + 0; x randn(1, n) 
d = 1.25 x ones(1, n) + o2 x randn(1, n). 


Figure 9.5b shows the n points (dı, d2), with the sample mean ma,,a, and the sample co- 
variance Vy, 4,. The error ellipse is now aligned with the axes. Taking differences has 
decorrelated the errors in dı from the errors in d2. The theoretical covariance matrix for 


(dı, d2) is diagonal: 
o? 0 
0 as l 


This example verifies experimentally the propagation law. We used that law for 
Z4 = TET". The samples of dz — dı give the experimental value of this covariance. The 
probability is very high that it is close to Lg. 

As a final curiosity, consider two satellites and receivers all on the x axis. For satel- 
lites 1 and 2 we can use single differences d! and d? as above. But the double difference 
d! — d? from all four measurements is automatically zero. The measurements p? and ps to 
the second satellite both include the distance s between satellites, and everything cancels 
in the double difference: 


d! —d? = (pi — p}) — (07 — o2) 
= (pi — 02) — (Pi +s — p — s) =0. 
In two or three dimensions this will not occur! Length is nonlinear; we have differences of 


square roots. Satellites and receivers are not collinear. Double differencing is the funda- 
mental tool in Chapter 15 on processing of GPS data. 


Example 9.10 (Variance of local east-north-up coordinates from GPS) The relationship 
between increments of geocentric coordinates (x, y, z) and of the local east-north-up coor- 
dinates (e, n, u) is given by latitude and longitude: 


e —sinàÀ Cos À 0 x x 
n | = | —sing cosà —singsinX cosg yi=F]y (9.58) 
u cosgcosA  cosøsinà sing Z 4 
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We want to compute the covariance matrix for e,n,u. It is based on a given covariance 
matrix Xyz for the geocentric coordinate increments in x, y, z. The law of propagation of 
variances gives 


Dou = Dek (9.59) 
A covariance matrix from practice, with units of mm, iS 


25 ~7.970 18.220 
Exyz = | —7.970 4 —6.360 
18.220 —6.360 16 


The position is g = 55° 54’, à = 12° 29’, so we get 


—0.216 0.976 0 
F = | —0.808 —0.179 0.561 
0.547 0.121 0.824 


Hence the covariance matrix for the coordinate increments Ae, An, An becomes 


8.34 3.96 —14.90 
Zeni = 3.96 3.95 —8.24 
—14.90 —8.24 32.52 


The standard deviations are og = 2.9mm, on = 2.0mm and o, = 5.7 mm. These numbers 
are in good agreement with experience. The vertical deviation o,, is about twice as large as 
the in-plane deviations. 
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The least-squares solution of a full rank problem comes from Ax, lA = ATE; lb, 
The right side has covariance Aly; ! Lp yA = At A = P. The left side P£ has 
covariance P £; P (again from the propagation law). Therefore Xz; = P7!: 


Ze = (AE, A)! = of (ACA). (9.60) 


The covariance Xz is the inverse matrix from the normal equations. It is useful to sepa- 
rate the scalar factor og (the variance of unit weight) so that X, = agC —l The Cholesky 
decomposition of A'C A gives 


(ATCA)! = (RTR)! = RI RTT. (9.61) 


The upper triangular R is computed by the MATLAB command R=chol(A’ + C * A). 

The variance propagation law is valid for all linear functions of the unknowns x. 
Each row in the j by n matrix F defines a linear function of ¥. By the propagation law 
the covariance matrix for Fx is F (ATE; l A)T F T In the special case f = Ax, the best 
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estimators are f = AX = p. Sometimes they are called the estimated observations or 
fitted values. Their covariance matrix is 


mp = A(ATE, A) TAT. (9.62) 


This is the a posteriori covariance matrix for the observations. It is the covariance for the 
projection p of b. The a priori covariance matrix, of course, is given as Lp = ogc =< 

If A is n by n, the factors in (ATC A)7! can be inverted and we get Xp = >; but 
this is no longer a least-squares problem. The equation Ax = b can be solved exactly. 

For assessing gross errors in geodetic practice we use the standardized residuals r: 


P=AR—b and F= (diag(Z;))/*7. (9.63) 


This achieves o? (F1) =--- = 0? (fn) = 1. Björck (1996) uses r = diag(A EAT) instead. 
In most cases the variance of unit weight og is unknown. Statistical theory gives the 
following unbiased estimate of oe when A is m by n: 


PTC? ||b— AR || 


m—-n m—n 


> 


j= (9.64) 
We shall outline a proof in case £p = ofl. If we want a valid proof for Ly Æ og I we have 
to decorrelate the variables by the transformation b’ = Wb, A’ = WA. This is described 
in Section 11.8 on weight normalization, where C = wiw. 

The minimum value for the sum of squares is given by 


rip = (b — p)! (b — P) = (b — Pb) '(b— Pb) =b'U — P)(I — P)b=b' (1 — P)b. 
(9.65) 


Here we exploited that P and J — P are projections: P = PT = P?. The expectation 
b' (1 — P)b is a quadratic form, therefore with a x? distribution. The number of degrees 
of freedom is the rank of Z — P. When P projects onto the n-dimensional column space 
of A, its rank is n and the rank of J — P is m — n. Hence 


b'(I — P)b ~ of x7_»- 
To prove (9.64) we compute the mean value, using equation (9.10) for E{ x7}: 
E{b' (I — P)b} = of E{xp_,} = op (m — n). 


In words, the number of degrees of freedom in a sum of squares is diminished by the 
number of estimated parameters. In 1889 the Danish actuary T. N. Thiele presented a rea- 
soning based on the only assumption that the observations are independent and normally 
distributed. He found that in the most simple linear model—the canonical model—xn ob- 
servations have unknown mean values and the remaining m — n have known mean values. 
According to Thiele estimation in this model is evident. Next he showed that any linear 
model can be turned into a canonical form by an orthogonal transformation. The estima- 
tors in this model, by the inverse transformation, can be expressed by means of the original 
observations. This fundamental idea of Thiele was not understood by his contemporaries. 
Only in the 1930’s the canonical form of the linear normal model was rediscovered. 
A comprehensive historical exposition is given in Seal (1967). 
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Example 9.11 We apply equation (9.64) to Example 8.1 where PTC? = 0.000 167 and 
m—-n=5—2: 


FTC? 0.000167 
m-n 3 


5g = = 0.000055 m?/km and ĉo = 0.007 5 m/ vkm. 


This variance is valid for the weight c = 1. According to definition (9.44) this corresponds 
to a leveling line of length 1 km. The variance is called the variance of unit weight. 
The covariance matrix, in units of mĉ, is 


0.2143 0.0779 
ge LP AL hess -4 
Z he 0.208 4 | A 


Finally the variances of the estimated heights Hp and Hg are the diagonal entries of Liz: 


6%, = 0.000021 m°, G4) = 0.004 6m; 

Gj, = 0.000021 m°, Gy, = 0.0046 m. 
The standard deviation dg = 7 mm / km for 1 kilometer of leveling does characterize the 
network as a fairly good geometric leveling. 


Example 9.12 In Example 8.6 the standard deviation of unit weight is ôo = ||F || /V3. 
This is 0.0168. The standard deviation of each component of the solution vector £ is 
0.011 9 because the inverse coefficient matrix for the normals is 


l l 
3 —4 0 
T4\-! l l l 
(AA) = ; 5 4 
l l 
0 -; 5 


Solving the Normal Equations and Estimating 7? C7 


Now we turn to the computational aspects of solving a least-squares problem. This means 
to solve the normal equations and to find expressions for various covariance matrices and 
error quantities. Specifically to calculate the weighted sum of squared residuals: PTCP. 
The topic is treated in more detail in Chapter 11. Here we restrict ourselves to using two 
standard methods: the QR factorization and the Cholesky method. 


QR Factorization For any m by n matrix A there is an m by m matrix Q with orthonormal 
columns such that 


O'A=Q'A= Hi l (9.66) 


The upper triangular matrix R has nonnegative diagonal entries. This QR decomposition 
of A is established in Section 4.4; it is the matrix statement of the Gram-Schmidt process. 
If A has full rank n, then R equals the Cholesky factor of ATA: 


A'A=[R" O]Q'O 4 = Rİ R. 
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Table 9.7 Formulas for weighted least squares 


Observation equations (A is m by n) 
Covariance matrix for observations 
Weight matrix (m by m) 

Normal equations (n by n) 
Solution of normal equations 
Estimated observations 

Estimated residuals 

Estimated weighted sum of squares 
Estimated variance of unit weight 
Covariance matrix for unknowns 
Covariance matrix for observations 


Covariance matrix for residuals 


Ax =b-r 


Xp (notice og = | a priori) 


CS DA 
A'CAX = ATCb 
x = (ATCA) !ATCb 
Ax = A(A!CA)~!A!Cb 


52 = FCF /(m — n) 
Ze = 62(ATCA)"! = (ATE, ' A)! 
Das =O A(ATCA) AT 


E; = 69 (C7! — A(ATCA) 1A?) 


Since Q is orthogonal it preserves lengths so that c = Q'b has the same length as b: 


Ib — Ax]? = |Q'b — Q'Ax||? = 


allo} 


2 
= Jlez? + ||Rx —c1||?. (9.67) 


The residual norm is minimized by the least-squares solution ê = R~'c,. The minimum 


residual equals the norm of c2. 


Cholesky Factorization Often the Cholesky factorization applies not to the normal equa- 
tions ATCA themselves but rather to a system augmented with a column and a row: 


ATCA 


ia 
(9.68) 


(A'cb)! b'Cb 


The lower triangular Cholesky factor of the augmented matrix is 


L 
nbyn 


L= 


zj 
l byn 


0 
n by 1 


A) 
l by 1 


We look closely at the lower nght entry s and compute the product 


Z 


Comparing this product with (9.68) we get 


ziz+s?=b'Cb. 


aa AL 
LL ToT 


Lz 
zizts2 | 


(9.69) 
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The Cholesky factorization finds an n by n lower triangular matrix L such that LL! = 
A'CA. Given this “square root” L, the normal equations can be solved quickly and stably 
via two triangular systems: 


Lz=A'Cb and L'k=xz. 


These two systems are identical to ATCA = A'Cb. (Multiply the second by L and 
substitute the first.) From these systems also follows z = L~!A!'Cb and consequently 


z'z=bICAL'L~'!A'Cb = B'CA(A'CA)'A'CB. (9.70) 
ee 


A 


x 
Insertion into (9.69) reveals that PTCP can be found directly from s: 


s? = b'Ch—z'z = b'Cb — bC A$ = —b'CF = F'CF-FTATCR =F™CP. (9.71) 
0 
We repeat the contrast between the QR and the Cholesky approach to the normal equations. 
One works very stably with A = QR (orthogonalization). The other works more simply 
but a little more dangerously with ATCA = LL!. 
The QR decomposition solves the normals as ê = R7!Q'b and PTCP = |\c||*. The 
Cholesky method solves two triangular systems: Lz = A'Cb and L'¥ = z. Then finally 
7™CP = s?. All relevant formulas are surveyed in Table 9.7. 


9.8 Confidence Ellipses 


Let the estimate (Xx l, £ 2) be the coordinates of one particular network point. These esti- 
mated coordinates have the covariance matrix 


2 2 
o 012 2| 51 922 2 


This 2 by 2 matrix is positive definite; so its inverse exists and is likewise positive definite. 
We introduce a local coordinate system with origin at (X,, X2) and with axes parallel to 
the original ones. A statistician would write x ~ N2(0, oe Q) which means that (x1, x2) 
has a two-dimensional normal distribution with zero mean and covariance matrix ae Q. 

In the local system (x1, x2) is a point on the curve described by the quadratic form 


xis ly = e. (9.73) 


This curve is an ellipse because =~! is positive definite. It is the confidence ellipse of the 
point—or error ellipse if it is conceived as a pure geometric quantity. It corresponds to the 
M-file errell. 
We denote a unit vector in the direction g by £ = (cos ọ, sing). The expression 
Ely = cosy xj + sing x2 
is the point error in the direction of &. It is the projection of x in this direction &. 
From the law of error propagation (9.56), the variance of ¿Tx is 


o? (Tx) = ETE = o? cos? o + 2012 cosy sing + os sin? o. (9.74) 
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Figure 9.6 The principal axes of the confidence ellipse. 


The maximum and minimum values are in the directions of the axes of the confidence 
ellipse. They are found as solutions to the equation do? /dg = 0: 


(a? — oF) sin 2g + 2012 cos 2g = 0. 


If o? = o? then the ellipse is oriented in the direction g = 45° (50 gon). If furthermore 
012 = 0 then the axes are indeterminate and & = o7/ and the ellipse is a circle. In all 
other cases the axis direction ø is determined through 


2 
tan2y = — 2. (9.75) 
ny) 


Now we rotate the x;x2 system by this angle @ around the point (X l, X>) to a y1 y2 SYS- 
tem. The y; axis is collinear with the major axis of the confidence ellipse; the off-diagonal 
entries of the covariance matrix now vanish; hence yı and y2 are independent. The eigen- 
vectors of & are diagonalizing the matrix (and its inverse). Since y; is in the direction of 
the major axis we have A; > Az and the equation of the ellipse is 


ai 
Àl 0 yl 2 
= .76 
[yı val] A B C (9.76) 
or 
2 2 
PE e (9.77) 


a ee] 
(Vii)? (vm) 


Notice that à; and i are the eigenvalues of the inverse matrix &—'. They are given by 


M1 Spo ee: EE C 2 2 
7 = z(o; +03 +y (of +07) — 4(a703 —0})). (9.78) 


By this we have the explicit expressions for determining the confidence ellipse: 
1 The semi major axis isa =c/dA, 
2 The semi minor axis is b = c./A2 


3 The major axis is rotated g away from the x; axis. 
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The Support Function 


The confidence ellipse is fully described. Next we shall demonstrate a geometric interpre- 
tation of the variance a? in (9.74) in the direction £. For convenience we rotate the ellipse 
to the yı y2 principal axes, and use wy for the angle from the yı axis. The equation of the 
ellipse is 


2 2 
AE age 


The support function p(w) is the maximum of yı cos Y + y2 sin y over the ellipse. This is 
not the distance from (0, 0) to the boundary along the y-line. If the tangent perpendicular 
to that direction touches the ellipse at yy, then the support p(y) measures the projection 
of yy in the direction y. 

Figure 9.7 shows the support distance p(0), measured to the vertical tangent perpen- 
dicular to y = 0. It also shows the distance p(7) to the horizontal tangent. All convex 
sets have support functions p. For an ellipse this is a fourth-order curve and not easily 
drawable, see Figure 9.7. The M-file support calculates and plots the support function of 
any ellipse, given the positive definite matrix X. 

Confidence ellipses close to circular shape are close to their support curves. But for 
flat ellipses the difference is large except in the four small sectors around the end points. 

We can connect p(y) to o? in (9.74). The length of the projection of the tangent 


point yy = (y1, y2) is 
p(y) = ly cosy + y2 sin]. 
We square this expression and get 
p(w) = yi cos? y + 2y1y2 cos Y sin y + y? sin? p. (9.80) 
The equation for the tangent to the ellipse at yy, is 


a sin y + 53 cosy = 0. 
We square and multiply by a7b?: 
b o ays a . 
—7 sin’ y + 2 cos“ y — 2y1y2 sin y cosy = 0. 
a 


Next we add this to (9.80) in order that the mixed products cancel: 
b? a? 
p(w) = ( yj cos? y + -y7 sin? y ) + | yz sin? y + yz cos? y 
! a? b? 


2 2 
= (24 + z) (a? cos? y + b? sin? y). 


a? b? 
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yo 


p(0) 


Figure 9.7 The support function at g = 0 and g = 77/2: ellipse for È = Ei Sol 
Using (9.79) for the special point yy we get 
p(y) = qa? cos? w+ b? sin? y = [ cos w sin v | k ci PA bead 
In the original x;x2 system (where the angle is ø) this is 
p(w) =[cosy sing] a 72 | pad | (9.81) 


This proves that the support function is actually the standard deviation o in (9.74). 


The preceding considerations were all based on a geometrical view of the quadratic form 
x'y~!y = constant. This view shall now be complemented by a statistical aspect. 

The degree of confidence œ for the ellipse (9.73) is given by the probability that 
the ellipse contains the true position (X1, X2). There are two situations: the covariance 
matrix & can be known or it may need rescaling by an unknown factor. Either og = lor 
else the variance is estimated by Ge =r! !r/(m —n). 


Known covariance matrix The random variables y; /Cav/À1 and y2 /ca+/A2 are indepen- 
dent and normally estates with mean vac 0 and variance 1. Statistics tells us that the 
sum of squares (yı / Ca A1 ) +(y2 /CaV/r2 )’ is x *-distributed with 2 degrees of freedom. 
The probability that 


yí ys 
— L + —2 <] (9.82) 
(av) (cava) 


is K (c2) — 1 — eal 2 ef. (9.19) and Table 9.6. Specifically this means that if the con- 
fidence ellipse is a circle with radius 10cm, then every 10th sample point falls outside a 
circle with radius 21.5 cm. Every 20th point falls outside a circle with radius 24.5 cm. 


Unknown covariance matrix We insert (9.72) into the quadratic form (9.73) and get 


xiguly 


n2 
T0 


xE lx = (9.83) 
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wey 


Figure 9.8 Confidence ellipses of points 1 and 2 and their relative confidence ellipse. 
The figure illustrates the geometrical meaning of o; and oy. 


The numerator and denominator are independent with the distributions for x4 and x7_,. 
Hence the left side has the F2 m—n distribution given by (9.20). Note that Q is known a 
priori. 

For m —n sufficiently large, say 50, we can act as if ôg were known; see Abramowitz 
& Stegun (1972), Table 26.9. We can choose «œ and c2 such that 


Prob(x? 27! x /2 < c2) = Prob( Fz m-n < c2) = a, 


This describes the ellipse in which x must lie with the prescribed probability a. 


Relative Confidence Ellipses 


A very important step in geodesy and GPS is to work with differences. We saw a one- 
dimensional example in Section 9.1 and in the M-file oned. The separate errors from a 
satellite to two receivers partly disappeared in the difference x2 — x1. The large tilted error 
ellipse (with off-diagonal correlation from the shared satellite error) became a smaller up- 
right ellipse in Figure 9.5 from a diagonal covariance matrix. This is the relative confidence 
ellipse. We now extend this idea to two dimensions. 

Points 1 and 2 have coordinates (x1, y1) and (x2, y2). The vector d between the 
points has components 


dx = x2) — X] and dy = y2 — yı. 


This vector comes from multiplying the full set of four unknowns x1, y1, x2, y2 by the 


matrix 
-1 0 +1 0 
r= wa 


Then the propagation law says that the covariance matrix of the vector d = (dx, dy) is 
yw aL 


3 Sele 
z4 = [-1 n| i all p |= [21 - 212- Eh + 22] (9.84) 
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Each of these blocks is 2 by 2. The diagonal blocks X; and X2 contain covariances for 
point 1 and point 2 separately. They correspond to ellipses 1 and 2 in Figure 9.8, and the 


matrices are 
2 2 
o o o Ox 
È] = al cs and >25) = i G : 
Oxiyi Oyi Oxy Oy 
The off-diagonal blocks £12 and 2G in the full 4 by 4 symmetric matrix & contain covari- 


ANCES Ox1 x55 Fx; y» Ox2y1> Ox2y2 Between points 1 and 2. Then the combination (9.84) of the 
four blocks is the relative covariance matrix Dy: 


2 


2 2 

oy = Ox, T 20x,x. + Ox, Oxy, — Oxy. — Oxay F Oxyz (9.85) 

ah 2 e e 
Oxiyi T Oxy — Oxy, E Oxy Oy T 20y, y, + Oy, 


This covariance matrix “ig produces a relative confidence ellipse with the following re- 
markable and simple properties. The standard deviation o; of the length of the vector, 
12 = (dx)* + (dy)?, can be found in Figure 9.8. The angle œ has tangent dy/dx. The 
quantities o; and log are determined by the support function in the vector direction and 
the perpendicular direction. This geometrical construction makes the relative confidence 
ellipse useful. Of course points 1 and 2 can be any two points in the network. 

The relative ellipse is often smaller than the separate ellipses 1 and 2. Typically this 
is the case when the covariances o,,,, and dy, y, are positive. Then errors common to the 
two points are removed in their difference d = (x2 — x1, y2 — y1). 

The M-files relellip and ellaxes compute the relative confidence ellipse. 


10 


NONLINEAR PROBLEMS 


10.1 Getting Around Nonlinearity 


This section introduces two new items: a plane coordinate system in which many geodetic 
computations take place, and the concept of nonlinearity which we remedy by applying 
the process of linearization. The plane coordinate system involves two unknowns for the 
position of each nodal point. Leveling networks involved only one unknown, the height. 

Basically, geodesy is about computing coordinates of selected points on or close to 
the surface of the Earth. This goal is achieved by making appropriate observations between 
these points. The most common forms are slope distances, zenith distances and horizontal 
directions. We take more observations than necessary. Here the principle of least squares 
enters the scene. It yields a unique result. Computing variances also furnishes us with 
statements about the accuracy. 

The accuracy measures are mostly for the benefit of the geodesist and the coordinates 
are the product delivered to the eventual customer. After each discussion of the problem 
and its linearization and its solution, we will add an important paragraph about variances. 

From ancient times it has been a tradition to characterize the position of a point by its 
latitude g and longitude à. We have inherited this tradition and even today most geodetic 
xy coordinate systems are oriented with x axis positive to the North, y axis positive to the 
East. Bearings a are reckoned counter clockwise from the positive x axis, see Figure 10.1. 
So in the usual rotation matrices we must substitute the rotational angle 0 by —8@. 

Until now all observation equations have been linear. They are characterized by 
having constant entries in the matrix A. Only very few observation equations are that 
simple. Even a distance observation results in a nonlinear equation: 


fi obsv m. (Xk — Xj) + (Ve = Yj)* + ri. (10.1) 


Here and in the following we denote the coordinates of points by capitals while small x’s 
and y’s denote coordinate increments. 

The principle of least squares leads directly to the normal equations in case of linear 
observation equations. Nonlinear observation equations first have to be linearized. This 
can only happen if we know preliminary values for the unknowns and subsequently solve 


343 
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Figure 10.1 The coordinate system. 


for the coordinate increments. When those increments are small, the linearization is jus- 
tified. The procedure is to be repeated—to be iterated. The estimates hopefully move 
the solution(point) closer and closer to the true solution. More about uniqueness in Sec- 
tion 10.1. 

Thus we start by linearizing around the basepoint X 0 = (X?, xX? ee y We 
assume that these preliminary values are known. They will be denoted by the superscript °. 
In vector form the linearized observation equation is 


f(X) = f(X) + [X - xo]! grad f |,_ yo + second order terms. (10.2) 


The increments of the coordinates are X — X° = (xj, Xk, yj, yk). In the calculations we 
restrict ourselves to a first order approximation (a linearization): 


f(X) — f(X?) = [X — X°]' grad f |, yo - (10.3) 


The gradient of a function has the geometric meaning, independent of the coordinate sys- 
tem in use, of giving the derivative of f in all directions. Equation (10.3) can be written 


change in f = (change in position) x (gradient of f). (10.4) 


In Figure 10.2 we show the meaning of this equation when f is a function of one or more 
variables. 
Now (10.2) can be written as 


0 
0 
COS Or, 
fi = SXR- X +2 -Y+ [a me y wl) pane 0.5) 
a) 
SIN aig 
A typical derivative of the square root is 
0 0 
oft ai, SUES Ty) Me TN = sina? 
OY, X=x0 2 fi x=x0 I. ite 
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exact f(X) 
linearized f(X) 
læ — X°)! grad f 


approximation error 
f (X°) + (X — X°)! grad f 


f (X°) 


Figure 10.2 The geometrical meaning of the gradient and the linearization process. The 
figure can be read literally as if X is a single variable (it is actually multi-dimensional). 


This is the last component in (10.5), and f° denotes ,/ (X? — xy + (Y? — y’). 


The unknowns (x;, xx, Yj, Yk) remain in the same sequence as indicated in x. Equa- 
tion (10.5) must be arranged according to the pattern in the linear observation equation 
Ax =b -r: 


0 0 : 0 : 0 a are 
[--- COSA, COSA, SINA, sina, | = bi — fi, (10.6) 


where b; = fi — I: Thus we have described the ith row of the system Ax = b — r. 
Until now we have presupposed that the unknowns really represent unknown coor- 
dinates. If one or more coordinates are to be kept fixed, we omit the columns in A corre- 
sponding to those unknowns. They never obtain any increment. Likewise these unknowns 
are deleted from x. This procedure is quite analogous to the one described in Section 8.3. 


Example 10.1 (Distance measurement) We observe 3 distances in order to locate point P: 
fı =100.01 and f= 100.02 and f3 = 100.03 


with weights C = diag(1,1,1) = J. Figure 10.3 indicates all preliminary values for 
the coordinates. As preliminary coordinates for P we use (X®, ¥p) = (170.71, 170.71). 
These values are calculated using simple geometrical relations. 

The first observation equation is 


0 TR AP [ex0 0 0 0 
|- cosa’p oo — SiN AD oo: | ha = fi — Xoo 7 Xp)? + Yoo — Yp) -ri 
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001 A (270.71, 170.71) 
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002 (100.00, 100.00) 003°» (100.00, 241.42) 


Figure 10.3 Determining the coordinates of a point by resection. 


or 
XP 
[-1 oÍ | = fı — 100.000 — rı. 
YP 
Similarly the second observation equation is 
XP 
[ 0.707 1 0.707 1 | | | = fa — 99.999 — ro, 
YP 
and finally the third observation equation 
XP 
[0.7071 —0.707 1 ] F | = f3 — 99.999 — r3. 
P 


Now all three equations together 


4 0 i 100.01 — 100.000 ri 
0.7071 0.7071 i |- 100.02 — 99.999 | — | r 
0.7071 —0.7071 | +”? 100.03 — 99.999 r3 


The normal equations A' AX = ATb are 


2 0ļ|]p| | 0.0267 
0 1]||p]  \—0.0071 |” 
As the coefficient matrix became diagonal (why is ATA diagonal in this case?) the solution 


is easily found as (xp, yp) = (0.013 4, —0.007 1). The final coordinates are (Xp, Yp) - 
(XO +p, YS + $p) = (170.723 m, 170.703 m). 
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In this simple example the residuals are most easily calculated as 


0.01 —0.0127 0.023 4 
f =b — Aŝ =b — p = | 0.02 | — | 0.0040] = | 0.0165 
0.03 0.0140 0.0165 


Finally 69 = y FTF /(3 — 2) = 0.033 m. The covariance matrix is 


R Z 0.000514 0 
Es = 69(A™A) =| | 


0 0.001 029 


and the standard deviations on the coordinates are og, = /0.000514 = 0.023 m and 
of, = V0.001 029 = 0.033 m. 

If we had chosen another basepoint—changing Y z to 170.69, say—we would have 
had the following calculations: 


0.010 
b = | 0.035 and A'b = a l 
0.017 l 


whereupon (p, $p) = (0.013, 0.013) and finally (X p, Yp) = (170.723, 170.703) i.e. the 
same result as earlier. 
The error calculation runs as follows 


0.010 0.023 
r = | 0.035 | — A berg = | 0.017 
0.017 l 0.017 


The sum of squares is TF = 0.001 094 and the standard deviation for the unit weight is 
6o = 0.033 m. 


The numerical differences in this example come into existence because we use two differ- 
ent basepoints and also because of common rounding errors. The former can be handled 
by using a reasonable procedure in which the basepoint is calculated again after every 
iteration; this is discussed in the next section. 


Iterative Improvements 


The general least-squares procedure is as follows: If necessary the observation equations 
are linearized. As a result we have the gradient matrix A and the right side vector b. More- 
over the weights are given by the matrix C. Next the normal equations are solved. Their 
solution ¢ = # is added to the vector of preliminary values X°: 29 = XO +20., The 
observation and normal equations are repeated but now with <“!) as basepoint. Thereupon 
the solution of the normals is ¢“!). This is added to ¥“) and we get ¥@ = x0 +20. It 
is not required that the gradient matrix A is known very accurately. In fact one may use the 
same A in several iterations. 
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V zz 


Figure 10.4 The exponent of numerical precision v. 


It is absolutely essential that the residuals r be computed with a higher precision 
than that of the rest of the computation. This is a general principle in all equation solving: 
Calculation of the residual is the critical computation and must be done with the most 
accuracy. It is rarely necessary to achieve such high accuracy in any other part of the 
algorithm, see Forsythe & Moler (1967), Section 13. 

Note the useful relation between the error € in x and the residual r: 


e=A'b-x = A`} (b — Ax) = A`!r 
or in norms 
lel < WAT ITI. (10.7) 


So far we have not discussed how to obtain the first guess X). The geodesist knows 
best how to find a close first guess. In the geodetic terminology this is named the compu- 
tation of preliminary coordinates. 

Usually the error is squared in each iteration until we reach the level of the rounding 
errors. But what happens if X) is far from the solution? The answer depends on f. 
Computational experience shows that * often will get close to a solution X, but it will 
move around this point in a certain distance and in many iteration steps. Sometimes it is 
impossible to guess which solution we finally will obtain. 

Finally there are some instances in which x“) does not converge at all. Instead the 
solution will reach certain limit points. This happens in case of gross errors in the data. 

How can we be sure that the solution is not spoiled by the effect of nonlinearity? If 
we Start the procedure by using good preliminary values for the unknowns and continue the 
iterative process until the right side of the normal equations is sufficiently small then we 
can be rather sure that we have found a satisfactory solution of the least-squares problem. 

Since 1967 geodesists at the former Danish Geodetic Institute have computed a quan- 
tity v based on the Cholesky reduction and defined as 


(10.8) 
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where z is the right side of the Cholesky reduced normal equations, see (9.70); v is called 
the exponent of numerical precision and it is an estimate for the number of numerically 
significant digits in the least-squares problem. Geometrically v is a measure for the an- 
gle at b under which the distance between the computed and the true solution is viewed, 
see Krarup (1982). This angle cannot be calculated, but v can (see Figure 10.4). 


10.2 Geodetic Observation Equations 


Distance 


Section 10.1 showed in all detail how to linearize an observation of distance. But com- 
putationally it can be an advantage to modify the distance observation s; by dividing it 
by the value so which is computed from the preliminary coordinates. Then the entries 
in A stemming from a distance observation become identical to those originating from an 
observation of a direction. 

Hence the observation is the measured distance s; divided by the distance s calcu- 
lated from the preliminary coordinates: 


(sp)? = (XR — XP) + OR- YPY. 


By calculating the gradient of s; D inserting the preliminary values, choosing the se- 


quence of the unknowns as (xj, Xk, yj, Yk), the gradient becomes 
Si Xk— X; Xęąk— X; Yr — Y; Yk—Y; 
grad 5 = (“EL A n a, (10.9) 


i SiS; SiS; SiS; 


By use of the preliminary coordinates the linearized observation equation then is 


0 0 : 0 : 0 
0 0 0 0 = 
Si S; S; S; yj 
Yk 
Si — s? 
= o = = b; — ri. (10.10) 


Sj 


The division with A formally can be looked upon as the result of observing the logarithm 
of Sik, rather than s;, itself. The subsequent differentiation of In s;, automatically leads to 
division by 5jx. 

The a priori variance of one single observation of distance is 


2 
Os 


=o? FOl, (10.11) 
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Figure 10.5 Observation of distance. 


Here oa? is the variance of the distance measurement, and oa? is the collected variance of 
centricity of both instrument and reflector. 

If there are n observations, the variance becomes 

2 2 
o-~+oa 
ee (10.12) 
n 

following the law of error propagation. It has been argued that the centricity contribution 
shall be divided by n. To our knowledge the expression shown is in good accordance with 
common sense and practical experience. 

Finally the weight for an observation of distance is set to cs = of Jo? where og 
denotes the variance of unit weight. 


Example 10.2 (Observation of distance—revisited) In Example 10.1 we now modify the 
observed distance by dividing it by the value resulting from the preliminary coordinates. 
Thus we use observation equations of the type (10.10). The weight matrix is set to C = I 
and the preliminary coordinates of P are fixed to (X9, Y9) = (170.00, 170.00). 

The first observation equation looks like 


0 : 0 0 


0 0 
Sy Sy YP 


or 
XP 
[ -0.0099 —0.000 1] | = —0.0070 — rı. 
YP 
The second and third observation equations are similar. The three observation equations 


together are 


0.0099 —0.0001] - 0.007 0 ri 
0.0071 0.0071 | £ | = | 0.0103l-ļlr 
0.0070 —0.0070 0.000 3 r3 
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The normal equations are formed and solved; the result is (xp, 9p) = (0.724, 0.699) or 
(Xp, Yp) = (X°, + Rp, Y} + Sp) = (170.724 m, 170.699 m). 

The present solution deviates 1 and 4mm from the one found in Example 10.1. So 
we compute a second iteration with xe: Y9) = (170.724 m, 170.699 m) as basepoint. 
Here are the observation equations: 


-0.0100 0.0000 ] 0.000 24 m 
0.00707 0.00707 l Á | = | 0.00019 | — | r 
0.00707 —0.00707 | +”? 0.000 13 r3 


The solution now is (£p, 9p) = (—0.000 6, 0.003 9) or (Xp: Yp) = (x? +X p, Y? +p) = 
(170.723 m, 170.703 m). This result agrees well with Example 10.1. 

One conclusion is that preliminary coordinates deviating up to 1 m from the final 
ones may result in changes of coordinates at mm level. 


Quasi-Distance 


When using electronic equipment for distance measurements it is difficult to avoid in- 
troducing systematic errors for various reasons: unknown frequency and wrong index of 
refraction, inadequate height reduction, incorrect map scale correction, a control network 
with systematically wrong scale, et cetera. For one or more of these reasons it is reasonable 
to introduce a change of scale u in the distance observation as described by the following 


equation 
si = (1 + p),/ (Xk — Xj)? + (Ye — Yj)’. (10.13) 


After having chosen the sequence of unknowns as (xj, Xk, Yj, Yk, 4) the gradient of the left 
side is 
Xk— Xj Xk — Xj 
grad sj = (-« T p (1+ D 
L 4 


Yk — Y; 


Y — Y; 
-4 +e) —, A+), & J. 


L L 


After inserting the preliminary values, the linearized observation equations are 


[o —(1 +p) cos 010), (1 + pw) COS Ors 
—(1+ u) sin a) (1+ p) sina, s? e] yy | =bi-fi, (10.14) 
Yk 


where bi = Si — so. The determination of weights occurs according to the variance given 
in (10.12). 
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Pseudodistance 


Distance observations also can be influenced by an error which is independent of the dis- 
tance itself. Such a distance is known as a pseudodistance. It fits the distance calculated 
from the coordinates of the terminal points except for an additive constant Z. In the realm 
of electronic distance measurements it is called an addition constant or today more often 
a reflector constant. 


The observation is the distance between the points j and k plus the reflector constant: 


Sik = 4/ (Xx — Xj)? + (Ye — Yj)? + Z. (10.15) 


Let us choose the sequence of unknowns as (xj, xk, Yj, yk, Z). Then the gradient of the left 
side in (10.15) becomes 


Xk— Xj Xk- Xj Me — Yj Yk — Y; +1) (10.16) 


grad sj x = (-=—. Pe arts gis co hee A 
Sik Sik Sik Sik 


The linearized observation equation is 


0 0 TE a yO a | S 


The weight of the observation is given by the expression (10.12). 

It is advantageous to divide the observation equation by Six so it becomes identical 
to (10.10), apart from the constant Z. 

The constant Z, which is a characteristic for a given distance measuring instrument 
and reflector, is usually best determined through a least-squares procedure for a whole 
control network. All distance observations are entered as pseudodistances. This procedure 
is strongly recommended for small networks especially as encountered in connection with 
deformation measurements. 

If one tries to determine Z in combination with measurements of a control network 
there will be too much common measurement noise to do it safely and the determination 
thus becomes too uncertain for any purpose. An alternative is to determine Z by measuring 
the distances between 3 points: 1, 2, and 3 on a straight line. Now Z can be found as 


Z = 513 — (512 + 523). 


Modern instruments have Z-values of a few mm. Since the standard deviations of average 
distances in control networks are greater, one might often neglect the constant Z. 
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Ratios of Distances 


You may take the view that the precise unit of length is unknown. This is due to lack of 
information about the frequency of the carrier or the index of refraction n or the unit length 
in an existing network which our new network has to be merged into. 

This view implies that only ratios between distances with a common terminal can be 
estimated. A ratio of distances only contributes to determination of the shape of a network, 
not its scale. This scale information has to be furnished otherwise. 

The observation consists of the ratio of distances from point j to the points k and I: 


Sik V (Xk — Xj)? + (Ye — ¥)* 


= i ea aE a (10.17) 


D Ce?) EASY 


We choose the sequence of unknowns as (xj, Xk, X1, Yj, Yk, y1) after which the gradient of 
the left side in (10.17) is 


Sik X1 — Xj Xx — Xj Xk — Xj X} — Xj 
gad = (| Ae a 
j f jk jk jl 
Vpo Y; -Yeoy; Yz — Y; Yı — Y; 
r| 72 J = 2 j, f 2 A —f 5 d , 
Si Sik Sik i 


where we have put f = s;,/s;;. The linear observation equation now is 


0 y0 0 0 0 y0 0 y0 
| (E E) oA ae x 
(si)? (Six)? (six)° ($5)? v 
0 0 0 0 0 0 0 0 
p(s - 4) ofk ` Yj pont e x] ay ee 
(s)= (si)? (s)? (s)? Yj 
Yk 


de Ded 0 7,0 
where bj = Sjk/Sji — Sig /S;1- 
Let the variance of the two distances be o2 , and o5 + We obtain the correct weight 
for the ratio of the two distances by using the law of propagation of variances: 


2 2 2 
fo ee Sjk s? Sji SjkSjl SjkSjl' 
The covariance between the two measured distances typically is positive. If we ignore the 
covariance oss; the weight cf = og /0; is systematically underestimated. 
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Observation of Coordinate and Height 


The observation equation of a measured coordinate, for instance from GPS, or a measured 
height, for instance from a tide gauge, reads 


+1 x; = Xiopsy — X? — 74. (10.18) 


If the observation is to be fixed to the measured value then the weight has to be 10—100 
times larger than the other weights. If it shall enter at equal terms with the other observa- 
tions then a weight of normal magnitude is appropriate. The actual situation will indicate 
the proper weight to be applied. 


Azimuth 


An azimuth observation determines the angle to the meridian. Astronomical azimuths are 
related to the geographical meridian while the magnetic azimuth relates to the magnetic 
meridian. In this case we have to know the angle between the magnetic and the geograph- 
ical meridian. 

Astronomical azimuths typically are determined with a standard deviation less than 
0.1 mgon. Magnetic azimuths cannot be determined better than 0.1 gon. 

Let the observed azimuth from point j to point k be denoted A;,. Then the observa- 
tion is 
Yk — Yj 


——. 10.19 
o (10.19) 


Ajx = arctan 


We choose the sequence of unknowns as (xj, Xk, yj, yx) and the gradient of the left side of 
(10.19) becomes 


(10.20) 


Yk— Y; Ye -Y;j Xk- Xj n) 
(Sik)? ° (Sj) S) (Sj) 


grad Ajk = ( 


The linearized version of (10.19) now is 


«yO 1 yO 0 0 

E RAR SIN Œg ESA COS i, Da Xk A _ 40 = 
50 s0 50 50 yl jk JETE 
jk jk jk jk J 


Horizontal Direction 


The result of observations of horizontal directions at a point j are reduced sets (mean sets) 
in which the first direction has the reduced value 0 gon. The observation for the bearing 
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Figure 10.6 Observation of a horizontal direction. 


Ajx Of the side jk is then according to Figure 10.6 
Ajk — Zj = Rjx, | eel ame (10.21) 


where Rjx denotes the observed bearing for the same side. The constant Z; is the bearing 
for the direction 0 gon of the horizontal circle of the theodolite. It is named the orientation 
unknown. 

We stress that Z; is a characteristic quantity for each separate setup of the theodolite. 
If we again place the theodolite in the same point we get a new value of Z;! 

The bearing is 
Yk — Y; d (arctan x) B l 


——— and the derivative n 5° 
Xg- xX j dx l +x 


Ajk = arctan 


It is easy to calculate the gradient of the left side of the observation equation when the 
sequence of unknowns is (xj, Xk, Yj, Yk, Zj): 
Yk — Yj Yk — Y; Xk— Xj Xk- Xj r). 


a RE L N A 10.22 
(Sik)? (Six)? (Sj) ° (Six)? 


grad(Ajx — Zj) = ( 


O Yp-Y? 
We introduce preliminary values Z; = Z; FZ AS; = arctan XI” and finally the 


linearized observation equation 


: 0 : 0 0 0 Xk 
2 0 0 0 Yj 
jk jk jk jk Yk 


= Rjk + Z? — A, — ri. (10.23) 
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Figure 10.7 Free stationing. 


The a priori variance of a bearing is 


site (oc w/sjk)? + 07 
oe Ee 
n 


; (10.24) 


where n denotes the number of observations (the number of sets if ø, is to refer to a bearing 
measured with n sets) and o? is the variance of a direction measured with 1 set and o; is 
the common centricity contribution from theodolite and signal. 

Accordingly the weight of a direction observation is c, = 96 2 Iq? o; where og is the 
variance of unit weight. 

For computational reasons it is an advantage to eliminate the orientation unknowns 
from the normal equations. The procedure is described in Section 11.7. 


Example 10.3 To illustrate the interaction between different observation types we take an 
example from daily geodetic practice, namely a free stationing. The data were collected 
by a group of students making their first experiences with a total station. 

The given points are situated as shown in Figure 10.7 and with postulated coordinates 
and observations: 


Station | observed | reduced mean horisontal 
station į | direction vp; | distance sp; 
[gon] [m] 


0.000 51.086 
107.548 50.771 
191.521 65.249 
231.709 110.362 


The a priori variances are fixed according to 
1 distances, cf. (10.12) 
2 oa? + oa? a? + (bs)? + o? 


O. =: ——— Z -m 
S , 
n n 
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2 directions, cf. (10.24) 


4 (Oc w/s)? + a? 
O, eee eo 
n 


We use the following values: a = 5mm, b = 3 x 1075, og = 5mm, o; = 1 mgon, 
n = 2, and w = 1/sin(0.001 gon) = 63 662. We seek the coordinates of point P and their 
standard deviations. 

First we have to calculate a set of preliminary coordinates for P. One solution is 
(X°,, ¥2) = (76.40, —878.35). The preliminary value for Zp is fixed as 


Yioo — Y? 
Zp = arctan = á = 279.312 gon 
— X 
100 P 


and we get the linearized observation equations for the four distances and the four direc- 
tions, cf. (10.10) and (10.23): 


0 
X100 — X? Yio — Ye SP,100 — Sp 100 
-— p xP — —3 YP OZ E a 
(Sp 100) (Sp 100) S P100 
0 
X103 — X? Yio3 — Y$ SP,103 — Sp 103 
eee pae Pt ee e a 
(Sp 103) (Sp 103) SP 103 
Veo Yo bane. ei Yio — Y? 
eee pis eel ys eg ae 7) —caretan - F + vpi- rs 
0 2 0 2 P yo , 
(SP 100) (Sp 100) X10 — Xp 
Vere? Kig Y103 — Yp 
IE gt OE E O TA eaten —— a gn re: 
(Sp 193) (Sp 103) 103 — Ap 
We arrange the unknowns in the following sequence 
XP 
x=] YP, 
ZP 
and the coefficient matrix A is 
0.0062 —0.0185 O 
X100 — X? Yio — Y? 
cos -—5— 0} |-0.0193 -0.0041 0 
5p, 100) (SP, 100) 0.0068 0.0137 0 
Wes ? _ | 0.0015 0.0089 0 
| Ytoo-Yp X10 -— XP i ~ |—0.0185 —0.0062 —1 
0 2 a )2 E —0.0041 0.0193 —1 
(SP 100) (SP 100 


0.0137 0.0068 —1 
0.0089 —0.0015 —1 


bT = [6.65 —15.05 —1.35 0.80 0.00 7.90 19.98 19.21] x 1074. 
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Note that the last four components in b are in units of radians. The diagonal entries of W 
are defined as the inverse of the standard deviation of the observations 


diag(W) = [ 0.195 0.195 0.193 0.181 0.224 0.224 0.285 0.463 ]. 


The weight normalized observation equations W Ax = Wb — Wr are 


0.001218 —0.003615 0 0.000 130 
0.003764 —0.000791 0 —0.000 293 
0.001314 0.002651 0 —0.000 026 
0.000279 0.001616 0 n 0.000014 
0.004153 —0.001399 —0.224 | | ?P | = | 0.000000 | 7 Y7 
—0.000909 0.004324 —0,224 | 7P 0.000 177 
0.003914 0.001940 —0.285 0.000 569 
0.004134 0.000715 —0.463 0.000 889 


Accordingly the weighted normal equations (WA)T(WA)x = (WA)! Wb. With C = 
WTW this is A'CAx = ATCb: 


0.0000679 0.0000021 —0.001 8956 | | xp 0.000 007 0 
0.0000021 0.0000483 —0.0008772 | | yp | = | 0.0000009 
—0.001 8956 —0.0008772 0.395 9460 Zp —0.000 6137 


The normal equations are solved by Cholesky reduction: 


hy = Jay 
lo) = a12/l11 Inn = Jan — 15, 
13) = ay3/11] 132 = (a23 — l21l31)/ l22 l33 = y a33 — 15, — 13, 


zı = bı/l1 
z2 = (b2 — l21z1)/ 122 
z3 = (b3 — 1312) — 13272)/133 


or 


liy = 0.008 242 
0.000 250 loo = 0.006 942 
—0.229 989 l32 = —0.118 063 133 = 0.573 683 


lo} 


13) 


zı = 0.000855 
z2 = 0.000 106 
z3 = —0.000 705 


Yz? = 1.239347x 107°. 
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The back solution is 


Aa = A/h 
x2 = (22 — 132x3)/l22 
x1 = (21 — 121 x2 — l31x3)/l11 


The result is 


This results in the following coordinates for P: 


X9: 76.400m | Y$: —878.350m 
+xp:  0.070m | +p: —0.006 m 


Xp: 76.470m | Yp: —878.356m 


and the following value for the orientation unknown 
Zp = Z®, + Zp = 379.312 — 0.078 = 379.234 gon. 
The a posteriori variance calculation is done according to the survey on page 336 and (9.70) 
r™CP = b'Cb — (A'Cb)'£ = bTCb — z"z 
= 1.2504703 x 107° — 1.239 3469 x 107° = 1.1123 x 1078. 
Finally ôg is likewise calculated according to the same survey: 


PTC? 1.1123 x 108 
m-n 8-3 


s = = 0.2225 x 1078 


Go = 0.47 x 1077. 


Next, we want to determine the confidence ellipse in P. We start by calculating the inverse 
normal equation matrix 


0.0083 
0.0052 | x 104. 


0.000 30 


1.7018 0.0785 
(ATCA)! = | 0.0785 2.1627 


0.0083 0.0052 


The part concerning the coordinates Xp and Yp is the upper left 2 by 2 block matrix 
(ATC DRE Hence the covariance matrix X; for the coordinates is 


a? A 
ar T 1 | Of n| 0.3786 0.0175 _4 
Be ee Ae 3 d 7 ae 0.4811] > 0 > 
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The bearing g for the major axis of the confidence ellipse is calculated according to (9.75) 


A 2612 2 bs 0.017 5 re 
t 2 = n OO d = 89.5 . 
ee = G2 — G2 (03786-04811 Me © a 


The square root of the eigenvalues of Liz are 


Se anne 
Jiz = 6.1mm = b 


which denote the semi major and the semi minor axis of the confidence ellipse of point P. 


Trigonometric Leveling 


An often used expression for the trigonometric leveling observation h looks like 


; s? sin? z +i —r. (10.25) 


h = s cosz + 


Here s denotes the slope distance between the total station and the reflector, z is the ob- 
served zenith distance, i is the height of instrument above the one mark, and r is the height 
of the reflector above the other mark, see Figure 10.8. Futhermore R ~ 6370 km denotes 
the mean radius of the Earth and the coefficient of refraction is for mid-latitudes about 
k = 0.13 and for instance in Greenland 0.17—0.19. Under other skies k is even negative. 

Height differences calculated according to (10.25) are usually applied on equal terms 
with height differences as determined by geometrical leveling. Only we have to determine 
the weight for the trigonometrical leveling observation. 


Reflector 
’— Surface of the earth 


Instrument. 


i s? sin? z/2R 


Figure 10.8 Trigonometric leveling 
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We apply the law of error propagation (9.57) and the weight relation (9.44) for de- 
termining the weight. We use (9.57) on (10.25) and get 


1—k i 1—k 2 
of = (cos: + R s sin? z) o? + (-s sin z + s? sin 2z) o? 


st sint z 
4R? 
where we regard R as a constant. It is reasonable to put sinz © 1 and s/R ~ 0 to get the 
simplified expression 


of +0? +0? (10.26) 


54 
of = = cos” zo? pis o? + ark re +o/. (10.27) 


We shall use this to investigate different situations: 


1 The most frequent use of trigonometric leveling is by the total station in a free sta- 
tioning. The height of the individual points is measured by placing the reflector at 
them. Realistic values for the standard deviations are o, = 2mm, o; = 1 mgon, and 
z = 95 gon. As we only deal with differences of height, i is eliminated and conse- 
quently o; = 0. Let og = 0.04, s = 200m and o, = 5mm; then the variance of a 
height difference h between two points at most 400 m apart is 


200 0002 A 2000004 - 0.04? , 
636622 4(6370. 106)2 


= 2(0.15 + 9.8 + 0.016 + 4) = 28mm’. 


o? = (o. 006 - 25 + 


The standard deviation of a height difference between points up to 400m apart is, 
under the given presumptions, 5mm which is equivalent to op = 8V L mm where 
the length L of the leveled way is in units of km. 


Is it possible to obtain a smaller variance by using sights of 50 m, for example, but 
in turn get 10 set-ups per km? The result is 20(0.15 + 0.7 + 4) = 100mm”. So the 
decisive fact is whether ø, is set to 1 or 2mm. 


These considerations reveal that trigonometric leveling qualitatively fully competes 
with geometric leveling. 


2 With the same assumptions, but with a longer distance s = 1 km the calculations are 


1 000 0002 z 10000004 - 0.042 
63 6622 4(6 370 - 106)2 


= 0.6 + 246.7 + 10 +4 = 261.3 mm. 


a, = 0.006 - 100 + +2? 


2 


Now we see that the important term is s a and why the fixing of weight has to happen 


according to 


2 
snad SO 
of s2 o? 
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So trigonometric leveling is weighted inversely proportional to the square of the length 
of sight. 

The expression for the variance shows that trigonometric leveling with short sights, 
and a possibly less accurate observation of the zenith distance, gives fully reliable results. 
With longer sights even a good observation of the zenith distance cannot compensate for 
the worsening refraction influence. This is one reason why trigonometric leveling with 
long sight lengths will never obtain a wide acceptance in flat and hilly landscapes. In 
mountainous areas it can be adopted. 


10.3 Three-Dimensional Model 


At the beginning of this chapter we introduced a plane coordinate system X, Y and devel- 
oped the whole theory in this plane. 

Because of the introduction of GPS, many geodesists want to solve their least-squares 
problems in a three-dimensional coordinate system X, Y, Z. This is the natural setting for 
a combination of classical geodetic operations and the global positioning system. So we 
shall develop most of the observational types again, in a 3-dimensional setting. 

First of all a few words about the whole idea. The classical geodetic observation 
practice is characterized by a lot of small reductions and corrections to the observations 
performed at the surface of the Earth. This happens in order that the least-squares compu- 
tations can easily be done in a horizontal 2-dimensional model at mean sea level. One 
can perceive this reduction practice as unsatisfactory today. With excellent computa- 
tional means such a procedure can be looked upon as outdated and less rigorous than a 
3-dimensional model. 

So we start by introducing the relation between Cartesian (X, Y, Z) coordinates and 
geogiaphical (9, A, h) coordinates 


X (No + h) cos o cos À 
ps (Ny +h) cosg sina (10.28) 
Z (1 — FYN, +h) sing 


The reference ellipsoid is the surface given by X? + Y? + Z? = 1. Recall the radius 
of curvature in the prime vertical (which is the vertical plane normal to the astronomical 
meridian) 


ee ee (10.29) 


J1- f2- fsi? 


The unit vector u normal to the surface is 


cosg COs À 
u = | cosgsind |. (10.30) 
sin o 
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Figure 10.9 Observation of distance s, zenith distance Z jk and azimuth Aj, between 
points j and k. 


The tangent n and binormal e unit vectors are derivatives of u (n alludes to northing, e to 
easting, and u to up, i.e. the normal direction to the surface): 


Ju — sin Q cos À i ou — sin À 
n = — = | —singsinàÀ and e= —= cosà |. (10.31) 
dg coso dA 
COS Q 0 


Verify that e =n x u. 

The unit vectors n, e, u provide the natural coordinate frame at a point on the refer- 
ence ellipsoid, see Figure 10.9. For reasons of reference we collect the three unit vectors 
e, n, u into the orthogonal matrix 


—sinXk —singcosrA cosg cosà 
R=|[e n u]=| cosà —singsinà cosgsina |. (10.32) 
0 cos gp sin o 


Now let the difference vector v between two points j and k on the surface of the 
Earth—not necessarily on the ellipsoid—be 


Xx — Xj 
v=|%—Y; |. (10.33) 
Zk— j 


Immediately we get from Figure 10.9 the n, e, u components of v 


vn = ssin Zjk COS Ájk 


vle =s sin Zz sin Ajk 


vlu = s cos Zik- 


364 10 Nonlinear Problems 


Theoretically, in 3-dimensional geodesy we have to consider 5 unknowns at each point: 
X, Y, Z and two parameters defining the direction of the plumb line. A convenient choice 
is the astronomical latitude ® and longitude A. 

Above we already have used the geodetic latitude g, and longitude À, and ellipsoidal 
height h. They are point coordinates equivalent to X, Y, Z. The astronomical coordinates 
are to be looked upon as direction parameters defining the direction of the plumb line. 


Horizontal Direction As usual the bearing Aj, is decribed by 


vie —(Xk—X;)sinà+(Yk—Y;) cosa 
Ajx = arctan =a Se T E r a 
vin —(X;,—Xj;) sing cos à— (Yg —Y;) sin g sin À+ (Zg — Zj) cos o 
We introduce the earlier relation between the orientation unknown Z; and the observed 
bearing Rjx: 


There is a little conflict between the notation for the coordinate Z; (in italic) and the orien- 
tation unknown Z; (in Roman type). As earlier we use the following sequence of the nine 
unknowns 


T 
adc xj yy 2 OD; Aj Xk Yk Zk Zj E 


We introduce preliminary values X; = X? + xj, Yj = Y? + yj, ..., Oj = $? +80; 
Z j= Z} + Zj. 

The difficult step is now to perform the calculation for grad(A;, — Z;) correctly. 
There are nine coefficients to be calculated. We start with 


OAK 1 v'nsind — sing cosAvie 
0X; 1 + tan? Ajk (vin) 
2A sind — sin p cos À tan Aj, | sinAcos Ajg — sing cos Asin Ajk 
= cos’ Å ik n = — 
J s sin Zję COS A jx S sin Z jx 
OAjk _ eee cosA — sing sin À tan Aj, | —cosAcos Ajg — sing sind sin Ajg 
eee e. J P Te . 
9Y; S sin Zję COS Ajx S sin Zjk 
ðAjk ži vle coso cos @ sin A jx 
Ap OOS = So ee 
dZ; 7 s sin Zjx COS A jx S sin Z jx 


Then the derivatives with respect to the astronomical latitude ® and the astronomical lon- 
gitude A: 


OA jk > (v Tn)v! 2 L vT êt yle 2 viuvte 
a = cos” Ajx , a —*— i = cos Ajk- T = cot Zjk sin A jx 
oP (vin) (vin) 
ðAjk >, (TnT — vi oh yle l 

= COS Apam aam = sing — cos Ajx cos ọ cot Zik. 


oA (vin)? 
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Now the gradient is 


9 


grad A; = |... - 

í S sin Zjk 
— cos À; cos Ajz — sing; sin à; sin A; cosg; sin Å;g , 
ss AAGE AA Raa eid Pj J J peta E Ls J* cot Zjx sin Ajk, 
S sin Zjx S sin Zjk 


sin À; cos A; — Sing; cos A; sin Aj, 
sin pj — COs p; Cos A jx COt Zik, Mpat Dan ie OAN 


s sin Zik ' 
— cos À; cos Ajx — sing; sin Àj sin Ajg cos gj Sin Ajx \ 
s sin Zjx i s sin Zjx 
The linearized observation equation then is 
grad Aj, x = Rye + Z? — Ajy — rik (10.35) 
where 
ò —(X? — X°) sind? + (Y? — Y?) cos à; 
A= oe a 0_ 70 0° 
=A, = X) sing; cos À; a(t = Y, ) sing; sin À; + (Z; — Z;) cosg; 


Zenith distance The zenith distance Z;, is expressed as 


vlu (Xk— Xj) cosg cosà+(Yk—Y;) cos ø sin à+ (Z — Zj) sing 


Zik = arccos —— = arccos 
: sf (Xe — Xj)? +(e — Yj)2 +( Ze - Zj)? 


Again we linearize the observation equation; the unknowns are arranged in the following 
order 


T 
KE ee Xj yy z; OD; SA; Xk Yk Zk TE 


It is not trivial to calculate grad Z;,. We start by introducing the notation Z;, = arccos £ 
and get 


ð ð krak 
OZ jk 1 Sax, - ax! 1 s(— cos ọ cos À) — Ea 


Ss Wee aaa ge a S 2 


1  —s cosg cosà + (Xx — Xj) cos Zik 
sin Z jx s? 


_ (Xk — Xj) cos Zig — s cos g cos À 
s? sin Zjx 


and similarly for 0Zj,/0X%, ƏZ;k/ƏY;, ..., 9Zjx/8Z,. The next derivation is 


OZix 1 vise 1 vin 
PF = —__ 22 = __ N. 
oP sin Zik $ sin Zik $ d 
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Finally 

OZ; 1 vis cosyvte cosyssin Zz sin A; 

SE ee Ne L 

JA sin Zik S$ S sin Z jx s sin Z jx 
Consequently 

Xk — X;)COSZjr —SC : COS Aj; 
grad Zj = (..., EAD ORM Seong cs 
s* sin Zik 


(Yk — Yj) cos Zik — s cos gj sina; (Zk — Zj) cos Zjx — s sing; 


9 9 


s? sin Z jk s sin ZL jk 
(Xg — Xj) cos Zik — s cos oj cos Àj 


s? sin Zjx 


3 


— cos Ájk, — COS gj sin Ajk, — 


9 g oao 


7 (Yk — Y;) cos Zig — s cos gj sin Àj B (Zk — Zj) COS Z;k — s SING; 
s? sin Z jk s? sin Zik 


The linearized observation equation for a zenith distance is 
grad Zję X = Zjk,obsv — Zik — Tjk (10.36) 
where 


0O_ yO 0 0 0_ y0 0 cin 30 0 7D) einan 
EET X;) cosg; cosà; + (Yg Y; ) cosg; sind; + (Z; AA 
l (X? — x92 + (Y? — y9)2 + (Z? — 79)? 


Slope Distance This is an easy case, similar to the 2-dimensional case. Only an extra 
term 1s added: 

Sjk = Vvlv = 4 (Xk — Xj)? + (Ve — Yj)? + (Zk — Zj}. (10.37) 
Distance observations are independent of the plumb line direction; consequently there are 
no unknowns for the corresponding parameters: 


T 
YE oct Xj Yj Zj Xk Yk Zk seal 
and we get 
Xp — X; Y; — Y; Tt Gi 
grad sp = (.... AAP £ T : U Sh aia : i) 
S S S 
(Xk — Xj) (Yk — Yj) (Ze — Zj) 
| Pee a 
The linearized observation equation is 
grad Sik X = Sjk,obsv — Six — Vik (10.38) 


where 


eae, 2) ee, HZ) 


Horizontal Distance This observation is just a special case of the combined slope dis- 
tance, horizontal direction, and zenith distance with Z;, = 1/2. 
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Leveled Height Difference Heights are conceptually related to the ellipsoid. So the most 
direct and easiest way to deal with height differences hj, = hy — hj is to calculate each 
term from the well-known formulas (10.28). We repeat them for convenience 


X (No + h) cos ọ cos À 
Y |=| (Ngt+Ah)cosgsina 
Z (1 — PN, +h) sing 


The solution of these nonlinear equations shall be presented according to an elegant devel- 
opment proposed by Clyde Goad. Suppose (X, Y, Z) are given and we want to solve for 
(9, à, h). By dividing the two upper equations we immediately obtain 


Y 
A = arctan —. (10.39) 
X 


Of course, special precaution has to be taken if X = 0; in MATLAB code this happens by 
the statement lambda=atan2(Y,X). This function works well as long as either X 4 0 or 
Y £0. When X = Y = 0, à is undefined and thus can take on any arbitrary value. Next 
we introduce the distance P from the point in question to the spin axis 


P = V X? + Y? = (Nọ + h) coso (10.40) 
Z= ((1— f) No +h) sing (10.41) 


which is to be solved for (g, h). We start with the values 
R E 
g? X arcsin(Z/r) 
h? ~r—a(l— f sin? g’) 


and by means of equations (10.40) and (10.41) we get (P?, Z°). The case r = 0 must be 
handled specially. Linearizing these equations we obtain 


0 0 0 0 AYA AYA 
Z(o, h) = Zí“ + AQ, h” + Ah) = Z(ġ`,h AAt Akt 
ðP aP 
P(p, h) = PG? + Ag, h° + Ah) = PY, h°) + FAG + = Ah + 


Neglecting higher order terms and rearranging we get a matrix form 


aZ ƏZ 
Z—Z°) [Az]_]| a ah |[Ag 
peor dP aP Ar| 
ðp ðh 


— | (Ny +h)cosọy sing||Ag| | coso sing |] (N +h)Ag 
—(Ny + h)sing cosø || Ah — sing cosg Ah 
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As a valid approximation we have set f = 0 in the partial derivatives. We invert (i.e., we 
transpose) the orthogonal rotation matrix to get 


cosg -sing || AZ| |(Ny+h)Ag 
sing cosp || AP| — Ah l 
Finally 


Ah = sing AZ + cosg AP (10.42) 


_ cosgAZ — singAP 


(10.43) 
No +h 


The iterative procedure is continued until an adequate precision is achieved. 

The M-file frgeod calculates Cartesian coordinates (X, Y, Z) given geodetic coor- 
dinates (y, A, h) according to (10.28). The M-file togeod calculates geodetic coordinates 
(gy, A, h) given the Cartesian coordinates (X, Y, Z) according to the above derivation. Both 
functions are most useful when processing GPS vectors, cf. Section 14.4. 

We prepare for the linearized observation equation for a height difference and rewrite 
equation (10.42): cospAP + sing AZ = Ah or 


ð P ð P i 
coso( 5 (x — xj) + ay Wk — »)) + sin (ze — zj) = hy — hj. 


The partial derivatives become 


dP X oP Y 


— = — = COSÀ and — = — = SInÀ 

ax Pp S py 
The linearized observation equation for a height difference is 

grad hjk X = hyk,obsv = hi — Vik. (10.44) 
If the unknowns are arranged as follows 
T 
x= Be Xj Yj Zj Xk Yk Zk see | 

then 
grad hj, = (..., —COSYCOSA, —COSg~sinA, —SiINg, cos o cos À, cos o sin À, siINg,...). 


One recognizes that grad hjg x is the dot product between the third column of R as defined 
in (10.32) with vector v as described in (10.33). 

A preliminary value for the observed height difference hi is calculated by the M-file 
togeod at both sites j and k. 


11 


LINEAR ALGEBRA FOR 
WEIGHTED LEAST SQUARES 


11.1 Gram-Schmidt on A and Cholesky on ATA 


The linear system is Ax = b. The matrix A is m by n and its n columns are independent. 
Then the n by n matrix ATA is invertible and symmetric and positive definite. 

The method of least squares yields an orthogonal projection of a point b onto the 
subspace R(A). This “column space” is spanned by the columns a1, ..., an of A. This is a 
linear subspace of m-dimensional space R”, but its axes (columns of A) are generally not 
orthogonal. The projection of b is the combination AX, where x satisfies the (weighted) 
normal equation A'CAx = A'CD. 

In least squares we write b as the combination 


b = £a + X20. +---+Xn,a, + error e. (11.1) 


Let the inner product between any two vectors a and b be defined as (a, b) = a'Cb. The 
symmetric positive definite matrix C is the weight matrix. 

We want to demonstrate by explicit algorithms that Cholesky’s elimination method 
(on ATCA) is equivalent to a Gram-Schmidt orthogonalization procedure on the columns 
of A. 

Let the orthonormalized columns be g;. These are orthonormal in the “C-inner prod- 
uct” so that (qi, qj) = q; Cq; = 6;;. The vectors q; are collected as columns of the 
matrix Q. Therefore we have Q'CQ = I. We define the matrix R by Q-'A, so that 


A= QR. 


The matrix R is upper triangular with nonnegative diagonal entries. This follows from the 
order in which the Gram-Schmidt orthogonalization is executed—one vector at a time. 

It is numerically safer to work with orthogonal and triangular matrices, Q and R. 
But we always modify the Gram-Schmidt algorithm, to subtract one projection at a time. 
And in most applications it is safe enough (and faster) to work directly with ATCA. 
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Calculation of R from Q and A 


From the order of the Gram-Schmidt process, each new Aj; is a combination of a; and the 


vectors A;,..., A;—-1 that are already set. When the A’s are normalized to unit vectors, 
each new q; is a combination of a; and q1, ...,qi—1. Therefore a; is a combination of 
gi,..-,qj. An early a does not involve a later q! 
We can express this as a growing triangle: 
ay = (41, 41)%1 (11.2) 
a = (41, 42)q1 + (q2, 42)q2 (11.3) 
a3 = (q1, a3)qı + (G2, 43)q2 + (43, 43)q3.- (11.4) 


In matrix form this is exactly A = QR. The entry r;; is the inner product of q; with aj. To 
see this, multiply to the left by qi. Remember that qi q; = 0 unless i = j: 
rit = (q1, a1) 
r12 = (41, a2) 
r13 = (q1, 43). 
Furthermore, a multiplication by qh yields 
r22 = (q2, 42) 
r23 = (q2, a3). 
Finally, a multiplication by qi gives 
r33 = (q3, a3). 
In the following step we use the expressions (11.2), (11.3), and (11.4) to form the inner 


products, and we arrange the system into a recursive solution: 


(ay, a1) = (q1, a1)? 
rii = (q1, 4&1) = V (a1, a1) 


(a1, a2) = (q1, a1)(q1, a2) 
(41,42) _ (a1, a2) 


(qi,a1) (a, a1) 


(ai, a3) = (q1, 41)(q1, a3) 
(a1, a3) 


r13 = (q1, 43) = Jaa) 


(a2, a2) = (q1, a2)? + (q2, a2)” 


rn = (q, a2) = y (@, a2) — (q1, a2)? 


(a2, a3) = (q1, a2)(q1, a3) + (q2, a2) (q2, a3) 
(a2, a3) — (q1, a2)(q1, a3) _ (a2, a3) — (q1, a2)(q1, 43) 


eae (42, a2) 7 J (a2, a2) — (qi, a2)? 


rı2 = (q1, 42) = 
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(a3, a3) = (q1, a3)? + (q2, a3} + (q3, a3)° 


r33 = (q3, a3) = 4/ (a3, a3) — (q2, a3)? — (qi, 3)”. 


It is useful to recognize that the upper triangular matrix R in A = QR is also the Cholesky 
factor of the matrix ATA. If A = QR then ATA = (OR)' OR. This equals R'Q'OR 
which is RTR. Therefore R is the upper triangular Cholesky factor for A! A: 


ATA factors into RTR = ( lower triangular )( upper triangular ). 
In actual calculations this offers two ways to compute R: 
1 Apply Gram-Schmidt to A. 
2 Work with the coefficient matrix ATA. 


The first method is slightly slower. For full matrices Gram-Schmidt needs about mn? sep- 
arate multiplications. This method is more stable numerically (we mean modified Gram- 
Schmidt of course). The errors are roughly proportional to the condition number c(A). 

The second method (direct solution of the normal equations) is faster. For full matri- 
ces it takes imn? multiplications and additions to compute the n? entries of ATA (which 
is symmetric!). Then elimination requires about 4n>—again halved by symmetry. This 
method works directly with the normal equations and it is by far the most frequent choice 
in practice, although numerically it is not quite as stable. 


Example 11.1 We use this algorithm on Example 4.16 with C = I 


1 2 3 
A=]|-l 0 -3 
0 —2 3 


The result of the MATLAB command [Q,R]=qr(A) is 


i l l 

—0.7071 —0.4082 0.5774 A 6 OB 
Q= | 0.7071 —0.4082 0.5774 | = F -7 43 
0 0.8165 0.5774 a ae 

ve AB 
1.4142 —1.4142 —4.2426 —~/2 —/2 —J18 
R=| 0 2.4495 2.4495|/~]}| 0 -V6 V6 


Example 4.16 showed how these numbers arise in Gram-Schmidt. 


Now suppose that C is not necessarily 7. The normal equation matrix is A'CA. We 
still have A = QR, but now the columns q; are orthogonal in the C-inner product: 
L afre] 


d TCQ=I. 
Q otherwise an Q CQ 


qiCqj = | 
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The matrix R is still a Cholesky factor! 
N = ATCA =(QR)'C(OR) = R'(OCQ)R= R'R. 


Notice that R = RTN and N~! = R—!R-T, Further we augment A by the right side b: 
[A b]. Set R'z = A'Cb and Rx = z. Then 


Nx = R'Rx = R'z = A'Cb. 
Calculational procedure 
1 Use Gram-Schmidt on A to obtain R 
2 Calculate z = R~'A'CD 
3 Solve Rx = z by back substitution. 


Thus the normal equations are solved. 
A more sophisticated procedure is to augment the normal matrix by b to N: 


z AT ATCA A'Cb RIR  RTỌ'Cb 
= = 11. 
4 Hg b] o a b'Cb | e 


Simultaneously we augment the matrix R: 


Š R z 
i ‘|. (11.6) 
Then 
oe RTI OIR zlTR'R RTz 
T 
R= : 11. 
á È Ilo ‘|lite ziz+s? ee 


Comparing with (11.5) we get 
ziz+s* =b'Cb. 
Repeating Step 2 above we have z = R~!A'Cb and 
zig =b'CAR!R'A'Ch=b'CAX and s* =b'Ch—bCA =F'CF. 


If we put b' Cb in the lower left entry of R, then after the solution we recover FTC? at the 
very same place. This square sum of residuals is valuable for estimating a lot of a posteriori 
variances. The residuals are defined as AX = b — f. 


11.2 Cholesky’s Method in the Least-Squares Setting 


We shall recast the least-squares problem and again solve it by means of orthogonal pro- 
jections. We consider the present procedure as the natural one from a geometrical point of 
view. The interesting fact to discover is that we do not need to know the orthonormalized 
columns q; of A explicitly. The idea is to make a conceptual short cut which offers a solu- 
tion to the least-squares problem directly from the method of Cholesky. It is considered an 
essential contribution and not just a nice variation of the traditional procedure! 
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Let the observations be collected in the vector b and let Ag be described in parameter 
form as the subspace of all Ax (the column space) shifted by t: 


y= Ax +t. 


The least-squares problem is to determine that point p € Ao which is closest to b. It is 
also the projection of b onto Ao. For practical reasons we first translate by —t; hence the 
problem is to find the projection of l = b — t onto the linear subspace of all y = Ax. The 
resulting x remains unchanged under this translation; but we must add f to the solution 
p= Ax. 

To repeat: We want to find £ such that 


l— Af L Ax for all x. (11.8) 


In other words, we want to split l = p + r into AX + perpendicular error. 
If the columns a; of A are the orthonormalized vectors q; we simply have 


p=) (q, Dq and r=l-p. (11.9) 
i=] 


The inner product (a, b) is defined as (a, b) = a'Cb. 

If the columns are not orthonormalized, we can make them so: Normalize the first 
column a;. Suppose the first i columns are orthonormalized. Then we simply write the 
next column as aj4; = p +r where p is in the linear subspace spanned by the first i 
columns—orthonormalized or not—and r is perpendicular to this subspace. Thus we can 
use the Gram-Schmidt procedure as described in an earlier section and the orthonormal 
version of aj+1 is p/||p|| = qi+1. 

Now we present the recursive procedure for solving the least-squares problem given 
by the observation equation matrix A and the observations—the right side b—via the 
Cholesky method: 


ay = (q1,41)q1 


(ay, a1) = (q1, 41)? 
(q1, a1) = y (a1, a1) 


a2 = (q1, 42)qi + (q2, a2)q2 
(a1, a2) = (q1, &1)(q1, a2) 
(a1, a2) 
(q1, 41) 
(a2, a2) = (q1, a2)? + (q2, a2)" 


(q2, a2) = y (a2, a2) — (q1, a2)? 


a3 = (q1, 43)q1 + (q2, 43)q2 + (q3, 43)q3 
(a1, a3) = (q1, 41) (q1, 43) 
(a1, a3) 
(q1, 41) 


(q1, 42) = 


(q1, @3) = 
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(a2, a3) = (q1, a2)(q1, a3) + (G2, 42) (G2, a3) 
(a2, a3) — (q1, a2)(q1, a3) 

(q2, a2) 
(a3, a3) = (q1, a3)? + (q2, a3)” + (q3, 43)" 


(q2, a3) = 


(q3, a3) = 4j (a3, a3) — (qo, a3)? — (qı, a3). 


We compare the present Cholesky method with the QR factorization on page 370. In both 
cases we have 


Haan. ape: x AO: 
rıl = y (a1, a1) na a a O aa) 


2 3 — (a2, a3) — (Qi, a2)(q1, a3) 
ri = (a2, a2) — (qi, a2) P ee ee ee 
y (a2, a2) — (q1, a2) 


(a3, a3) — (q2, a3)? — (qı, a3). 


r33 = 


By this we have established that the upper triangular matrix R in A = QR is also the 
Cholesky factor of the matrix ATA as proved directly on page 370. 

The interesting fact is that we do not need to find the orthonormal columns q; explic- 
itly. So we have combined the Gram-Schmidt process and the normal equations and solve 
the latter by the Cholesky method. 

We emphasize that (11.9) is equivalent to the normal equations. 


As we proceed through this book we will recognize that matrix inversion only is needed 
for covariance information, not for equation solving. In both cases we prefer procedures 
based on the Cholesky factorization. 


Example 11.2 We describe a least-squares problem by the A matrix given in Exam- 
ple 11.1, except we delete the last column: 


1 2 
A=j|-l1 0 
0 =2 


The right side is b = (1, —2, 2) and the weight matrix C = I. From the above procedure 
follows 


R= lars sel = k 4 
0 (q2, a2) 0 v6 
Then 
1. 1 [v6 -v2 Te) E 
Next 


z = R7TATþpþ = 


ale ale 
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and the solution is 


Finally the projection p of b is 


2 
p=Ak=3|-7 
5 
According to (11.8) we also have 
i a 
/2 /6 9) 
l 
p = (q1, b)qi + (02, bN = = -5 - R =} 7 
2 5 
i /6 


The last expression is based on the explicit knowledge of g; whereas all other derivations 
in this example are not. The computationally intensive part is obtaining the inverse of R 
(or solving the triangular system for z). 


11.3 SVD: The Canonical Form for Geodesy 


In this section we want to introduce new coordinate systems in which the least squares 
problem become simpler and thus more lucid. After the coordinate transformation we say 
that we have obtained the “canonical form” of the least squares problem. 

Let the linearized observation equations be 


Ax=b (11.10) 


where x is an n-dimensional vector decribing the corrections to the unknowns (coordi- 
nates), and b is the m-dimensional vector of observations. They are presumed uncorre- 
lated and with equal weight. In other words, the covariance matrix for the observations is 
b=. 

The singular value decomposition demonstrates that it always is possible to find an 
orthogonal matrix V (in the row space of A or the coordinate space) and another U (in the 
column space of A or the observation space) so that 


y= Vx and c = Ub. (11.11) 
Consequently equation (11.10) can be written 
By=c (11.12) 
where 


B=UAV'. (11.13) 
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The SVD chooses U and V so that B is of the form 


n 


D |}n 
B= Sh oe (11.14) 
with the diagonal matrix D 
0] 
02 
D= . (11.15) 
On 


The matrix © is an (m — n) by n zero matrix. Furthermore, all the singular values o’s are 
ordered decreasingly 
01 > 02> 03>: > 020. 


Evidently we have 


B!B = VATUTUAVT = VATAV! = D? (11.16) 


D2 | © 
BB! —UAV'VA!U! = UAATUT = ern (11.17) 


Remember that an orthogonal matrix Q has QTQ = QQ! = I. Now put 
VT = (91, G2,---s Gn); (11.18) 


and 


and 


Ul = (Wi, Wa,- +, Wns Ply P2, +- -> Pm-n), (11.19) 


where {g;} denotes an orthonormal set of n-dimensional vectors, and {y;}U {pi} is another 
set of m-dimensional vectors. The transposed version of equation (11.11) shows that y 
can be looked upon as the coefficients in the expansion of the vector x on the set {g;}, and 
correspondingly c can be conceived as coefficients in the expansion of b on {y;} and {9;}. 
We call {g;}, {yi} and {p;i} the first, the second, and the third set of canonical vectors even 
though they are not determined uniquely. 

A Statistical interpretation of the canonical form of the least-squares problem is given 
by Scheffé (1959), Chapter I. The space spanned by the {p;} vectors is called the error 
space, and the space spanned by the {¢g;} is called the estimation space. 

On the basis of (11.13) and (11.14) it is possible to show that the following relation 
between the first two sets of canonical vectors is valid: 


AQj = Oi Wi, ba 1 2 ae on: (11.20) 

Similarly, from the transformed version of (11.13) and (11.14) follows 
Alw; =oi9;, i=1,2,...,n (11.21) 
Api = 0, i=1,2,...,m—n. (11.22) 
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As the orthogonal transformation U of b leaves the observations c in (11.12) weight nor- 
malized we can solve the least-squares problem via the normal equations 


B'By=B'e (11.23) 


or 


C: 
of yi = o1c;, Aea AE /) or finally ya F oe N = 
J 
where r denotes the number of nonzero o;’s and the Latin subscript j denotes the j’th 
component of these vectors. 
These facts lead to the following important consequences: 


1 About the first set of canonical vectors: The components of x which are best de- 
termined are those in directions defined by g; with large o;; entries in the direction 
of gi with o; = 0 are totally undetermined. 


In order that this result be interpreted correctly the corrections of the coordinates 
must be measured in approximately the same unit. Usually one wants that all entries 
of x are determined with equal accuracy. 


2 The second set reveals the observations which should have been performed with 
larger weight. It is so because we expand the individual observations—i.e. the unit 
vectors in the observation space—in {w;} and {p;} and subsequently choose those 
with dominating coefficients and corresponding to small values of o; # 0. These 
coefficients can be found by inspection of the matrix U. This is a consequence of the 
property UTU = I which again can be taken as the one which yields the expansion 
of unit vectors into the sets {y;} and {;}. 


3 For the third set we realize that the entries of the observations in directions deter- 
mined by {p;} do not give any new information about the coordinates. This is also 
valid for y; with i > r, i.e. the eigenvectors corresponding to eigenvalues o; = 0. 
Maybe it is relevant to define the redundancy for the i’th observation in the following 
manner 


(11.24) 


11.4 The Condition Number 


Geodetic network analysis is based on the fact that we can build the left side of the normals 
A'CA when we know the topology of the network, described by A. We assume known 
weights C of the observations. Hence we may calculate the covariance matrix Xy = 
(A'CA)~!. The actual observations only enter into the right side. So much network 
analysis can be performed without taking a single measurement. 
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One good measure for comparing various networks is the condition number. In the 
next section we define the condition number and demonstrate its use in connection with 
simple 1- and 2-dimensional geodetic networks. 

The norm and condition number of a matrix play a vital role in the numerical calcu- 
lations and design problems of geodesy. Norm inequalities for vectors and matrices often 
are used for estimating the influence of roundoff errors and observational errors. The con- 
dition number also can be introduced in the optimum design of networks. This number 
helps to decide if one network design is better than another. 

The following matrix is typical for many least-squares problems: 


2 —l 


—-]1 2 
It is the coefficient matrix in the normal equation for 


1 a regular traverse along the x-axis with postulated abscissas at both terminals. Only 
distances are measured and with equal weights. 


2 a leveling line with postulated heights at both terminals. All observations are of equal 
weight. 


3 A? is the coefficient matrix for a regular traverse along the x-axis with postulated y 


values at the terminals. All angles are supposed to be measured with equal weights. 


The eigenvalues of A are 


à; = 4 sin? =1,2,..., n. (11.25) 


7 
2(n +1)’ 
The condition number of a positive definite matrix is Amax/Amin. For this example we find 


: 2 
Amas _ ASIN? sey (e) E 


c(A) = =. 
Amin 4 sin? ati JT 


Finally c(A*) = c(A)* ~ 0.2n4. 

This can be extended to a 2-dimensional leveling network, covering a rectangular 
area subdivided into m by n squares. All differences of height between neighboring points 
are observed with equal weight. The eigenvalues for this normal equation coefficient ma- 
trix are 


z2 jr ey) kn J SA ar 
Ajk = 4| sin” ———— + sin* ————- }, 
2(m + 1) 2(n + 1) Ke TD cee Ths 
Let p = max(m, n) and we get the following estimate of the condition of the network: 


2 _ x 2_ x 
COS“ a m + COS’ n o Soa 
paa Cie N y. 0 < > < constant, p > 10. 
Sin ZFT) ý 
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Comparing this condition number with the one from the leveling line we see that the condi- 
tion number is doubled for the rectangular network. At the same time the number of points 
increased by the square of p. We may conclude: If the difference of height between two 
points has to be determined as accurately as possible, the network cannot be too “narrow.” 
A genuinely 2-dimensional network has better condition number. Of course, the observa- 
tional work increases tremendously compared to the accuracy. But this just confirms that 
accuracy costs money. 


After this summary of elementary network analysis we shall derive the results in detail. 


11.5 Regularly Spaced Networks 


1-Dimensional Networks 


We start by considering 1]-dimensional networks as shown in the Figure 11.1 below. A 
straight line Pı P, is subdivided into smaller segments by the points P2, P3,..., Py— 1. 
The length of each line segment between two consecutive points P;_; and P; is measured 
and the logarithm of the measured quantity is given with weight c;. Thus the observation 
equations are 


ING =X) SS; weight c; 


where x; is the abscissa of point P;. The linearized, and weight normalized observation 
equations are 


~ — X. 
i i—l1 
We set 
Ci 
di , 
0 0 
(xj — x;_4)? 


then the (n — 2) by (n — 2) coefficient matrix of the normal equations is 


a2 + a3 —a3 
—a3 a3z+a4 —a4 
Ng = —a4 a4ras —as ; (11.26) 


—An—| An—-1 + Qn 


We have omitted the unknowns X and xo corresponding to fixing the network at the points 
Pi and Py; 


P) Pz P3 Pai 2 Pay Pn-2 Pn- Pn 


Figure 11.1 Regular traverse. 
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The matrix N4 is now transformed to Np = DN, D with the transforming diagonal matrix 
D = diag(—1, 1, -1,..., (—1)"~°). 


The result is 


a2 + a3 a3 
a3 a3 + a4 a4 
a4 a4+as as 
Np = (11.27) 
An—2  An—-2 + An-1 an—| 
An—1 An—| + an 


Evidently, all entries of Np are positive. Since the matrix is positive definite all its prin- 
cipal minors are positive; therefore the theory of oscillating matrices applies to Np, see 
Gantmacher (1959), Chapter II, § 9. 

If we transfer this theory for matrix (11.27) onto matrix (11.26) we get: 


Theorem 11.1 Let the eigenvalues of N4 be ordered as 
OS Ay < À2 <... < Apo: 


Let the eigenvector for A; be g1; the components of this eigenvector are nonzero and have 
the same sign. The eigenvector g2 corresponding to Az has a sequence of components 
with one change of sign. Generally, the eigenvector øg corresponding to eigenvalue A, has 
exactly k — 1 changes of sign in its components. 


Fora; = k, i = 2,3,...,n, we find the explicit expressions for the eigenvalues 
LIU 
hj; =Aksine ——_==, PAD ora gn 11.28 
i sin Xai) l n ( ) 


and non-normalized eigenvectors 


Lj 
n— 1 


gy; (i) = sin PE a are (ee a8 (11.29) 


The condition number is 
. (n—2)n \ 2 2 
SIN 47] 2(n — 1 
c(Nqg) = = a | N (== . 
sin z- T 
From theorems in Gantmacher & Krein (1960), p. 127-129 the following inequalities can 
be set up 


; x l T 
min(a; + đi+1) — 2 max (a; ) cos T <1 < max(q; + đi+1) — 2 min(q;) cos 
n — 


n — 


min(a; + aj41) + 2 min(a;) cos : < Àn < max(q;j + aj41) + 2 max(q;) cos 


n — n— 
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For k = min(q@; + aj41) = max(qa; + aj41) = 2 min(a;) = 2 max(a;) we get 


It 
ee) ee 
l a ET 


A ni 
An = 2k cos? ———— = 2k sin? —. 
‘i cos nD sin nD 


Example 11.3 In order to illustrate the above theory we perform a small calculational 
experiment in MATLAB. Let 


2-1 0 0 -1 0 0 0 
-1 2-1 0 0 1 0 0 
N=] 9 7 21 ad A a OG 
0 0-1 2 0 0 1 
Then 
2100 
1210 
NPE eg. t 4 
0012 


which obviously has positive entries. The eigenvectors are the columns of 


0.588 0.951 0.951 0.588 
[0.951 0.588 —0.588 —0.951 
P= | 0.951 —0.588 —0.588 0.951 
0.588 —0.951 0.951 —0.588 


and the eigenvalues are arranged in increasing order 

A= [0.382 1.382 2.618 3.618]. 
Now note that g; has no change of sign, g@2 has one change of sign, g3 has two changes of 
sign, and finally gq has three changes of sign. 


Another situation is the following: let the length of the single sub-intervals be known but 
suppose the line P; P, has infinitesimal breaks at points P2, P3,..., P,—; and further sup- 
pose that the angles B; at P; between P; P;_; and P; P;4; fori = 2,3,...,n—1 were mea- 
sured. These angles are likely to be close to x. The linearized observation equations are 


dy; —dy;_ dy; — dy; 
Ja (ge ~ ) = adb, E 
Xi — Xi XiT X 


We put 
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Then the upper left 3 by 3 entries of the normal equation matrix Ng becomes 


(a2 +b2)* +a} —b2 (a2 +b2) —a3(a3 +53) a3b3 
—b2 (a2 +b2) — a3 (a3 + 53) bi + (a3 +b3)* +a% —b3(a3 +63) —a4(a4 +b4) 
a3b3 —b3(a3 +b3) —a4(a4 +b4) b? + (ag +b4)* +02 


The matrix N, can be transformed in a manner similar to N4 to obtain an oscillating matrix. 
Theorem 11.1 is valid for the eigenvalues and eigenvectors of Ng. ` 

In order to obtain explicit results let us set a? = b? = k. Then the eigenvectors are 
those given in (11.29) and the eigenvalues are 


LIU 
à; = 16k sinf ———— 
j sin in LD 


and the condition number is 
n (n—2)n \ 4 4 
SIN 3771) 2(n — 1) 
J(n—1 
(Na) = ee GD \ x (=) | 
sın n-i) IT 


We repeat the derivations and this time we combine both distance and angle observations. 
We restrict all side lengths to be 1. That leads to simpler expressions. Again we intro- 
duce unknowns x2, x3, ..., Xn—1, and y2, Y3, ..., Yn—1, see Figure 11.1. The distance 
observations s; ;4 1 only depend on the x;’s and are of a simple linear type 


Siit) Ti+) = Xi + Xi4t. 
The angular observations 8; depend only on the y;’s: 
Pi = arctan(y; — yi41) — arctan(yj;-1 — yi). 
A linearization yields the following 
Bi + ri = Yi — Yi+1 — Oi-1 — Yi) = —Yi-1 + 2yi — Yi+1. 
We gather all the linearized observation equations in matrix form 


X2 
X3 


Yn—3 
Yn—2 
Yn—1 
(11.30) 
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The number of observations is 2n — 1, the number of unknowns is 2n —4, hence 3 redundant 
observations. We assume all observations of equal weight: C = I. The problem evidently 
splits into two: one for distances and one for angles, as the distances only depend on x and 
the angles on y. For the distances we get 


2 -l1 
—] 2 -l1 
Aj Al = te 
—] 2-1 
-1 2 


This (n — 2) by (n — 2) matrix is simple and we immediately quote the explicit expression 
for the inverse matrix 


j(n—1—i) 


T -l1 
(A; A1);; = Tel 


for i < j and else symmetric. 


Especially note the variance at the mid-point i = j = (n — 1)/2, which is 


—1 
p= ln | = ee n— I1 
2 a EE, 


da ek e a 


This result is in good agreement with the one from conventional least-squares. 
The eigenvalues already are given in (11.28): 


ài = 4 sin? EN EE ene 


2(n — 1)’ 


The condition number is c(Aj A1) = Amax/Amin © 4(n — 1)*2 ~2, and the norm is 


. 2 (n— 2)0 
AT Aj|lo = 4 si? ———.. 
A; Arll2 = 4 sin LD 
Likewise for the angular observations 
6 —4 1 
—4 6 -4 1 
1-4 6-4 1 
A> A3 = A 
1-4 6 -4 1 
1 -4 6 -4 
1 -4 6 


The inverse of this (n — 2) by (n — 2) matrix is 


i(é+Iam@—1—-jm—J) 


T -1 _ 
(A2 Adi; 7 6n(n2 — 1) 


(2i@ -1- jf) +Gn- Ij -i) +n +1). 


This result is due to Torben Krarup who derived it for the present problem. He used a 
Gauss-like elimination technique. 
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Historically geodesists have focused on the diagonal term i = j = (n — 1)/2: 
_ YNGH+)@-1- )e- 4) 
ae 6n(n2 — 1) 
x [(n-D(n-1-47) 4041] 


(Ap A) no 


on-l > o (Qn? = 1)(n? +3) 
= 92n K a eT D] = 192n 


This expression demonstrates that the standard deviation of the mid-point in a regular tra- 
verse grows like n?/*. 


2-Dimensional Networks 


An adequate description of adjustment problems in 2-dimensions is obtained by introduc- 
ing the Kronecker product. Let A = (aij) and B = (bij) be m by n and p by q matrices, 
respectively. Then the Kronecker product 


A Q B = (aij B) (11.31) 


is amp by nq matrix expressible as a partitioned matrix with a;; B as the (i, j)th partition, 
beet ly teste 7 as Ve hat: 
The formal rules for operating with Kronecker products are as follows: 


0O@A=AQBD=0 (1) 

(A; + A2) 8 B= (A1 @ B) + (A2 @ B) (i) 
A&8(Bıi+B2)=A8Bı+4A8B2 (iii) 
aA®BB=ap(A ® B) (iv) 
A1A2 @ Bı B2 = (A1 8 B1)(42 8 B2) (v) 
(AQ B)7! = A7! & B7! if the inverses exist. (vi) 
(AQ B)! = AT QB! (vii) 


Consider a regular leveling network consisting of m by n points in a rectangular 
mesh. Suppose the difference of height between neighboring points is observed with vari- 
ance g? and the individual differences are uncorrelated. Without loss of generality we set 
o* =], 

The height of point P, s is denoted A, s and the observed differences of height d,p sq 
must be indexed with four variables. By means of the Kronecker product we denote the 
“horizontal” observations by 


(Im & H,„)h = dı 


where the subscripts indicate the dimension of the square matrices. Similarly the “vertical” 
observations are 


(Km Q Ibh = do 
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FEE Hh 


where h is an mn vector containing the heights of all “free” points, and dı and dz are 
mn — m and mn — n dimensional vectors containing the observed differences of height. 
Furthermore H, is the (n — 1) by n dimensional matrix: 


—]1 1l 


1 
Hn = . (11.32) 


zal 
—l 1 


As usual the normal equations are made by multiplying to the left by the transposed coef- 
ficient matrix; the “horizontal” contribution is 


(Im 8 Hn)" Im ® Hn) = (Im HI) Um ® Hn) = Im Q (Hy An) 


according to the computational rules on page 384. The “vertical”? contribution likewise is 
(HI Hm) ® In. The total normal equation matrix is 


N = Im ® (Hy Hn) + (Hn Hm) @ In- 


Again we look for the eigenvalues and eigenvectors of N. We define the following eigen- 
value problems 


Hy Hn Yn = Wn 
Hn Hm Qm = LPm 
and 
D = Gm Q Pn. 
The total eigenvalue problem then is 
N® = (Im @ (HI Hn) + (H; Hm) ® In) (Om Pn) 
= Imm Q (Hp Hn) Yn + (H, Hm) Pm Q In Wn 
= Øm @AWn + UPm @ Pn = (À + WOm @ Pn = A+ uÈ. 
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Note that à and u are constants. The above derivation clearly shows that the eigenvalues 
for the total problem are given as the sum of the “horizontal” and the “vertical” problems. 
The explicit expressions for the eigenvalues can be shown to be 


— 1 

ped = E” 
2m 

. 2 (j—1)x 
= 4 sin? "~ 
Lj in P n 
and thus 
_45 10 IU 
Ai = a( si? Fa + sin? z) (11.33) 


Accordingly a network fixed along all boundaries has the following sets of eigenvalues 


kesan -ee E E i E | 
2(m — 2) 
. jr 
= 4 si —~_, 2,... n=] 
Mj in 2(n — 2) J n 
and all eigenvalues for the total problem 
l LIC ; jr 
Ajj = 4{ sin* ———— oe |, 11.34 
ij (sin eae aa 5) Soe 


We shall estimate the condition number for this problem. Let p = max{m, n} and let m/n 
be bounded, then 


- 2 (m—2)n - 2 (n—2)x 
4(sin (m2) + sin re) _ 4p? 


c(N) = ae 


“2. 29 . 2 2 
4(sin Zmz + Sin s) 


Example 11.4 We want to describe in detail the situation for m = 3 and n = 2. The 
horizontal observations (Im  Hn)x = bı are 


X1,1 

1 X2,1 

hb & Ayx = l @[-1 1 | ¥3,1 
Xj ,2 
X2,2 
X32 


or 
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Likewise the vertical observations (H2 @ h)x = bo: 


X1,1 
-1 0 |} 1 04] 0 OF} x, 
O-1 | 0 1 |o O |} x5, 
0 0 |-1 0 | 1 Of] *12 
o 0o | o-1 | 0 1 J] 22 
X3,2 


The horizontal contribution to the normals is 
(Im ® Hn—1)" Um ® Hn—1) = Um O Hp 1)Um @ Hn-1) = Im Q Hy Hn- 
and the vertical contribution 
(Hm-1 ® In)™(Hm—1 ® In) = Hp Hm-1 Q In 
In general the total normal equations are 
N = Im Q H}_,Hm—1 + H} _;Hm-1 Q In 
In our special case 


N=1@H{ H; + HFM Q1 


=l 0 0 0 0 
0 0 

0 0 

= T 

0 0 

1 -1 

1 

Example 11.5 Linear regression at unit times t = 1, 2,..., m has observation equations 


Ax=b-r: 
txi + x2 = b; =f 
With C = / the matrix in the normal equations is 


1 1 
"a=! 2 d 2 1| | gmn+1Qm+1) xm(m+1) 
1 PIS £ sm(m + 1) m 


m i 
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The determinant is (m* — m>*)/12. The eigenvalues are 
TR m| (m + 1)(2m + 1) + 6] + my (m + 1)2(2m + 1)? + 12(m? + 3m + 5) 
er 


This expression can hardly be reduced so we make an asymptotic expansion. The condition 


number c(A! A) = Amax /Amin 1S approximately 16 m? + som X 0.59m? +0.89m. Here are 


exact values: 
om3 
c(ATA) | 46.14 | 55.78 | 69.99 | 187.12 


Example 11.6 One might ask: What happens to the condition number under the elimina- 
tion process? After some elimination steps the situation looks like 


with 
Ji A3! + 4) B(C — BTA'B)-!BTAS! —A3'B(C — BTAZ'B)`! 
7 ~(C — BTA)'B)~'BT as! (C — BAS! B)! 


It is important to recognize that the (2,2) entry (C — BT AD B)~! is symmetric. Its con- 
dition cannot be greater than c(A~'), and as the condition number for a matrix and its 
inverse are equal we have c((C — BTA)! B)7) < c(A). Now C — BT A,'B results from 
elimination of the first dim(Ag) rows of the A matrix and therefore the answer is: The con- 
dition number is not increasing during the elimination, for symmetric positive definite 
matrices. 


The condition number for the least-squares problem of linear regression and the distance 
measurements in the regular traverse is O (p?), p being the number of points. Many geode- 
tic problems are of this type. A distinct other type is connected to observation of directions. 
This type has condition numbers proportional to p*, much worse than the former. 

The mathematical explanation is that the first type is connected to differential equa- 
tions of second order while the latter type is connected to equations of the fourth order. 
Two-dimensional networks with combined distance and direction observations typically 
have condition numbers proportional to p In p. 


Example 11.7 Table 11.1 indicates the error propagation in various geodetic networks. 
They are from Meissl (1981) and references therein. The variance changes depending on 
translation of origin, rotation of axes, and scale of the local coordinate system. 

The observations are assumed uncorrelated with standard deviation øo. The least- 
squares problem is solved by using minimal constraints. The error propagation is measured 
by the largest point variance o$ = (a? + a) 2. 

The boundary conditions are most important in case of direction observations. 
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Table 11.1 Asymptotic behavior of point variance o$ 


Square shaped networks 
Direction observations: unknown translation, rotation, and scale.” 
Error propagation depends on the boundary conditions: 
Free boundary: F is proportional to 
Distances are observed between all neighboring boundary points 
Azimuths are observed between all neighboring boundary points 
Distances and azimuths are both observed along the boundary 


All boundary points are fixed Inn 
Pure distance observations: unknown translation and rotation Inn 
Azimuth observations: unknown translation and scale Inn 
Distance and azimuth observations: unknown translation Inn 
GPS networks: known translation, rotation, and scale constant 


Oblong shaped networks 
Traverse 
Chain of triangles 


“This is also the photogrammetric case of observation of model coordinates. 


A posteriori covariance In the beginning of the 1970’s intensive research was made to 
establish continuous analogues of actual discrete networks. The attempt was successful as 
far as absolute observation types were concerned: levelling, distances, and azimuths. 

A useful outcome of the continuous model is the equivalent of the covariance matrix, 
namely a covariance function. Such covariance functions are indispensable for network 
design and other advanced topics. 

The derivation of these covariance functions can be found in Borre (1989). A simple 
example is the covariance function for a regular, triangularly shaped network covering 
the whole plane, see Borre (1989), equation (3.77). The network has no boundary. The 
distance along all sides of the network is observed with weight a (per unit area) and the 
azimuth with weight b. 

To get an elegant formula we introduce the variables A = (9a + 3b)/4 and B = 
(3a + 9b)/4. Then the a posteriori covariance function with distance and azimuth obser- 
vations is nearly 


G(P, 0) = A+B ie 0 | A—B hove aa (11.35) 


4ABn O Inr = (A +B) sin2g — cos 2g 


where (r, g) denotes the polar coordinates for the difference vector between any two points 
P and Q. The distance r has to be measured in units of the side length. 

The formula for G(P, Q) demonstrates the “Taylor-von Kármán structure.” The ra- 
dial error is a function of Inr while transverse error is a function of alone. 
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Figure 11.2 Confidence ellipses from the a priori covariance ogei! * and the a pos- 
teriori covariance G(P, Q) in (11.35). The 11 by 11 ellipses are calculated by fixing the 
midpoint P. The ellipses continue throughout the whole plane. 


Hence G(P, Q) acts as a rational covariance function for the plane. Now we shall 
state some applications of G(P, Q). 


Example 11.8 In an equilateral triangular network we have observed all distances and 
azimuths with uniform weights a = b = 2 / (V3 o?) for the whole network. Equal weights 
yield a homogeneous network. We have A = B = 3a in (11.35): 


R 1 Inr 0 2 
GP, Q) = Te | 0 wr (2 


The covariance function is independent of g; this phenomenon is called isotropy. 


Example 11.9 Consider the same network, but only with distance observations. Now 
b = 0 and A = 9a/4, B = 3a/4, anda = 2/(V3 a”). The covariance function is 


linr 0 cos 29 sin 2@ 3 
G(P, eee ae eae | l 
2) = san il 0 A 4 fae Eoo 
By other methods a better approximation is found in Bartelme & Meissl (1974) for the 


term Inv in the first matrix, namely Inr + 0.599.... 


Example 11.10 We estimate the variance between the abscissae of points P and Q in 
Example 11.9. Formal use of the fundamental solution yields 


o (rpo) = ae (mr — } cos 29)o°. 


Such formulas are good approximations for what Baarda calls criterion matrices, but the 
continuous model yields singular expressions at r = 0. 


Figure 11.2 illustrates the important difference between a priori and a posteriori co- 
variances. We plot confidence ellipses based on the expressions for autocorrelation for a 
Gauss-Markov process and (11.35) for covariances on an infinite regular network. As long 
as the shape of the network does not become pathological the present figures reflect the 
error situation quite well. 
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The actual number of points is not felt when we are 2-3 side lengths inside the 
boundary. Therefore Figure 11.2 reflects the error situation for a quite general class of 
geodetic networks. We refer also to the asymptotic results in Table 11.1. 


11.6 Dependency on the Weights 


A person who has gained a basic knowledge of least-squares problems soon asks the ques- 
tion: How important are the weights? If I change them a little how much does this change 
the solution? Let us try to get some insight to the problem. As usual we start by a set of 
observation equations Ax = b — r and with weight matrix C. 

We separate the observation vector b into two subvectors. At first we put no restric- 
tions on the dimensions of bı and b2. The vector r and matrix A and weight C are 


by F1 Aj Cı 0 
b = = A= C= 
Hi . Hi H [o A 
This means there is no weight coupling between the two groups of observations. The 
original problem becomes two distinct problems: 
Aix =b; -ri with weight matrix Cy 


Ax = b2 - r2 with weight matrix C2 


and we can calculate the solution of each system. Now the interesting question is: How do 
these solutions behave compared to the solution of the total problem? 

We shall see how the solution £ is changed by ôx when we change the weights C2 of 
the second group to C2 + ôC2. The expression for ôx involves matrix calculations leading 
to a useful matrix formula. 

The normal equation for the original, total problem is 


a aie e E] 


This can be written as 
(ATCA; + A3C242) = A] Cibi + AZ Cobo. (11.36) 


Notice how the normal equations from the two problems have been collected. Contribu- 
tions to the normal equation possess an additive character. The perturbed problem is 


(ATC1A1 + Ad (C2 + 8C2)A2) ($ + ôx) = (ATC1bi + AF (C2 + 8C2)b2). (11.37) 
Now subtract (11.36) from (11.37) to find an equation for ôx: 
(ATC1A1 + AZ (C2 + 8C2)A2)ôx + AZ8C2A22 = ASC bp. (11.38) 
We set N = ATC] Ai + ATC2A2 and f2 = by — A2x. Then the change in £ is 


ôx = (N + A}SC2A2) AZ8C2P2. (11.39) 
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The following important formula, taken from (17.47), yields the change in the inverse: 
(N + A38C)A2) | = NT! — N71 AT (AgN7! AE + (8C2)7!)"'AgN7!. (11.40) 


For n observations, this matrix multiplies Ad 8C2F2 to give ôx. If we specialize to one 
single observation, A} becomes an n by 1 matrix and 6C?2 is 1 by 1. We name the product 
A,N~!AI = s and get 


ôC 
6x = NIAT 1 - — s 6C ñ 
1+s56C) 
or 
Py ôC 
6x = 292 _ yl al, (11.41) 
1+s6C> 


The expression (11.41) reveals several interesting facts. The change ôx in the solution (in 
a first order approximation) is proportional to the change of weight 6C2. The change ôx 
increases most in the unknowns which are connected to the observations contained in A. 
The product N~! A! only has contributions from the columns in N~! which corresponds 
to entries in Al different from zero. 

Already in 1823 C. F. Gauss found a similar expression for changes of a single 
weight 5C2. See the section ‘Updating the unknowns and their weights when the weight 
of an observation changes’ in Gauss (1995). 


Example 11.11 We want to demonstrate the procedure on a simple least-squares problem: 


1 1 2 l 
A= 1 2 and b= |1 and C +ôC2 = 2 
—] 1 0 1 +ôC2 


Then ôC2 = 0 produces the solution ¥ and the residual vector r as usual: 


Setting ôC2 = —1 corresponds to eliminating the third observation. Then x = (3, —1) 
is the intersection point of the first two rows. Varying 5C2 from —1 to oo we get the 


xX lies on row 3 
when 6C>7 = œ 


x for 5C> = 0 


(3, —1) for 6C> = —1 


Figure 11.3 Dependence of ¥ on C2 in Example 11.11. 
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Pi p2 


Figure 11.4 The dependence of the projection on the norm. The figure is drawn in 
Cı-norm. 


line segment from (3, —1) to (5/11, 5/11) on the third row. Letting all weights vary, we 
evidently can obtain solutions anywhere in the interior of the triangle. 
The M-file dw reflects what goes on. 


This approach to changes of weight is excellent for a small number of observations. For a 

more qualitative knowledge about a larger problem, we shall turn to a more powerful tool. 
Now we ask for more quantitative results: If the weights are changed, how much can 

the projection p = Pb move in the column space of A? A measure for this movement is 


tana = W227 Pulley (11.42) 


|b — pille, 

The norm is weighted by ||x||c, = ||Ci1x|| and @ is the angle at b which spans pj p2. Note 
that Figure 11.4 is drawn in Cj-norm. The angle at p2 is C2-orthogonal; but in the C1- 
norm the angle is + — a. The change from C4 to C2-norm leads to a change of the angle 
from 5 to 5 — a. 

The square root W of a matrix C is the positive definite matrix that satisfies W? = C. 
Such a matrix certainly exists, is unique, and nonsingular. We define D = Wi lc, Wr, 
whose condition number c(D)—the ratio between the largest and smallest eigenvalue— 


can be related to the angle a: 


2|tana| < yece(D)— a (11.43) 


Let us repeat: A least-squares problem is given by the coefficient matrix A, the weight Co, 
and the observations b. We consider all weight matrices C such that the eigenvalues of 
Wo g Wo l lie in the closed interval [s, t]. In the Co-norm we always find 


itana] < $(,/f—/2). (11.44) 


The angle œ measures the displacement of the least-squares result as seen from b. At least 
one matrix C exists for which the equality sign is valid. Furthermore, the ratio between the 
norms of the residual vectors corresponding to Cı and C2 is bounded by 


l llb — pille, 
< ———— < yà (11.45) 
A/ Amat Ib aa P2iic> i 


where Amax 1s the largest eigenvalue of the matrix W, lc 1 W3 L 
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The main results about the changes of weight in a least-squares problem are quoted 
from Krarup (1972). Once again the condition number is prominent. We are saying that 
the effect of ignoring a possible correlation between the observations can be dangerous if 
the condition number is large. 


Example 11.12 We introduce an n by n covariance matrix with strong correlation: 


2 ==] 
—-l 2 -1 
Lub = Ea 
—l 2 -1 
-1 2 
Its inverse has the special form 
A ar fori < j 
ar eee for i > J. 


; 2 
The eigenvalues are 4 sin? ACE so the condition is c(D) ~ ste < n?. By (11.44) 


l 
|tana| < (Ve — as) X 1 (n + 1) —> OO. 


A strong correlation can take us arbitrarily far from the solution corresponding to C; = I. 


Example 11.13 The correlation is weaker if D = C2 = I +t£}' andt ~ 107°: 


1+t4sin? 27 2 

¢(D) = ——— 2") = (1+ 4)(1 - 4655] ~ 1+4t. 
1+74sin Fn+h 4n 

Oë x 


1 
anal < }( Fi- e) = 
2 J/1+4t a/1 + 4t 


Thus | tana| < 2t. 
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A Classical Procedure 


Now we want to describe how to eliminate an unknown from a least-squares problem. 
Some unknowns may be of very little importance. They are introduced into a least-squares 
problem in order to treat correlation in a proper way; but otherwise these unknowns are of 
no interest. Sometimes such unknowns are called nuisance parameters, like the orientation 
unknowns in direction observations made by theodolite. 

To begin with we will assume that all observation equations have weight 1; otherwise 
they would be normalized by the square root of the actual weight. 
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The actual procedure is easily described by means of a concrete example: 


1 O 0 
1 1 C 8 
1 3 H =| 2 —r. (11.46) 
1 4 20 


The normal equations are 


4 8lle 36 
he A «| ~ Be ek 


and the solution is 


If we solve the equations according to the method of Cholesky the computations run as 
follows: 


1 Triangular decomposition of the left side 
y= V4 =2 
lhi = 8/2=4 Inn = V26— 16= V 10 


2 Forward elimination on the nght side 
Z] = 36/2 = 18 
z2 = (112 — 4 - 18)/v 10 = 40/10 


3 Back solution 


d = 40/(V10-V10) =4 

c= (18 -—4-4)/2=1. 
Starting from this example we shall study the following trick: We augment the existing 
observation equations with a fictitious equation. It is the sum of the given equations and 


it is assigned the weight Cfe = -t = —i. (The sum of the identica! entries in the first 
column of A is t.) This new equation is 


4c+8d=36 withweight —j. (11.48) 


The augmented normal equation system has a singular matrix ATC A: 


1 1 O 
Í PF a ajete & 
0 3 4 8 i ia 0 10 
-4 | |4 8 
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The system A'CAx = ATCb is still solvable (always): 


t = A g [ao] (11.49) 


which yields d = 4. By insertion into (11.48) we get c = 1. At first sight the singular 
normal equations (11.49) have a surprising structure which we want to illustrate. 

Comparing the two systems of normal equations (11.47) and (11.49) it becomes 
evident that the unknown c has been eliminated. Our earlier standard method of elimination 
was elementary row operations. We demonstrate the method on the actual numbers: 


4c+ 8d= 36 4c + 8d = 36 


ener (11.50) 
8c + 26d = 112 10d = 40, 


and this reveals nothing new. 

Earlier we assumed that all entries are identical in the column of A corresponding to 
the unknown to be eliminated. In practice these entries often are 1. 

If the weights c; of the single observations are varying then the weight of the fictitious 
equation has to be changed to cag = —1 / $"; ci and the summation in A has to be 
performed as a weighted summation. 

The variance a? of the eliminated unknown c is calculated as follows: In order to 
calculate the inverse matrix of the normal equations we use the same row operations as 
above on the unit matrix: 


Or 


: 13 —4 
(ATCA)! = [vı nl=a| 4 Ai 


This result, of course, is also obtained when using ordinary methods for the inversion of 
the coefficient matrix (11.47). 
In order to calculate the variance factor or we have to use (9.71) 


PTCP = b'Cb —z'z =b'Ch— Y z} = 64 + 64 + 400 — (324 + 160) = 44. 
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An identical result is produced by calculating 


1 —] 
p=Ax = > and then f=b-p= 7 
17 3 


Subsequently, of = 44/(4 — 2) = 22. Note that n is not to be reduced because of the 
elimination. Implicitly there still are n unknowns. Finally 


22-4 


vy. 2226 
: 40 


= —___ ied: 
40 


=143 and ô= 
We summarize the method: In a least-squares problem we want to eliminate an unknown 
with constant coefficients. This is done by augmenting the original problem with a ficti- 
tious observation equation which results as the sum of all observation equations in which 
this unknown appears. The new equation is given the weight —1/(sum of coefficients of the 
unknown). Next, the remaining n — 1 normal equations are solved in usual manner. Sub- 
sequently, the eliminated unknown can be calculated from the new observation equation 
by insertion of the solution. The variance of the unknown is calculated from the normal 
equations by using unit vectors as right sides and by using the same row operations as for 
the elimination of the unknown. 


Eliminating From the Normal Equations 


We want to make the above description more general and cogent by using the technique of 
block elimination. Let the normal equations be split as follows (remember B = C Iy; 


A Bijx| |b 

C Dil|x} |b] 
Block elimination subtracts CAT! times the first row [A B] and bı. This is achieved by 
multiplying to the left with the elimination matrix: 


ca ile ollel" lca lle] 


A B xi] bı 
0 D-CAIB||x| |b —CA`!b; |’ 


The last row contains the wanted expression for the remaining unknown: 


or explicitly 


x) = (D — CAT! B)! (b) — CAT! b). (11.51) 


This formula is coded as the M-file elimnor. 
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Eliminating Parameters: A Reduced Estimation Problem 


Suppose the vector x of modeling parameters is separated into an unimportant part y and 
an important part z. Then we can eliminate y from the normal equations and solve only 
for 2. And we can return to find y if we want. It is useful to describe those steps. 

We will execute a standard elimination of y from the normal equations for x = 
[f Z]. Start with the observation equations 


b=Ax+e=By+Gzte. (11.52) 


Denote the weight matrix 2, | by C. The normal equations are 


BT yj BT 
Pace a]|2|=| Gr] eo. (11.53) 
This produces two block equations for y and 2: 
BT T a T 
ae ied II n (11.54) 
G'CB G’CG||Z G` Cb 


To eliminate }, multiply row 1 by GTC B(B'CB)7! and subtract from row 2. This pro- 
duces a zero block in row 2, column 1. It leaves an equation for Z alone, with a modified 
weighting matrix C’: 


G'c’GZ=G'C'’b with C’=C—CB(B'CB)'B'C. (11.55) 


Note that C’B is the zero matrix. The algebra has confirmed what we could have expected: 
the reduced model for Z alone has a smaller weight matrix C’ (and a larger covariance 
matrix) because the By term is projected out. As always, back substitution in (11.54) 
yields y when we know 2: 


> = (B'CB)~'B'C(b — G2). (11.56) 


Eliminating From the Observation Equations 


If we use the above procedure directly on the observation equations it works formally, 
but we generally get a wrong result. The column space must be split into two subspaces 
conforming to the splitting of A into [ A; A2], and furthermore, the two subspaces must 
be orthogonal complements. 

Let the observation equations be partitioned as follows 


[Ay aa]| | = , (11.57) 


The columns of A = [A; A2] span R(A). We decompose R(A) into the space spanned 
by the columns of A; and into the orthogonal complement R+ of R(A). By definition we 
have: R(A) is spanned by A = [Ay A2] and ATCA} = 0 where Až span Rt. 
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AixX1 + A2% 


Ai (X1 — x1) 


A2(¥2 — x2) 
R(A2) 


R(A®) 


Figure 11.5 Geometry of the orthogonal decomposition in the column space of A = 
[Ay Az]. 
The projector P = I — Ay (ATCA) !AjC projects Az in R(A) to A; in R+: 
AS = (I — Ai (A] CA1) A] C)A2. 
The reduced observation equations are 
(I — Aı(A]CA) AT C)A2x2 = (I — A1(ATCA1) AT C)b 
or abbreviated 
A3xX2 = b*. 

Next we solve for x2: 

x2 = (AŽTC AŠ)! AS" CD". (11.58) 
The above procedure has important applications. When processing GPS observations we 
may want to eliminate the ambiguity unknowns. We rewrite (11.57) as 

A,X1 + A2x2 =b. 

The unknowns x; are eliminated by multiplication by the projector P: 


P Aixi + P A2x2 = Pb. 
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As PA; = 0 the transformed observation equations become 
A5x2 = þ*. 


We depict the geometry of this orthogonal decomposition in Figure 11.5. 

Finally we want to demonstrate the procedure on the observations from (11.46). The 
coefficient matrix A is split into the two columns [ A; A 2]. Hence the projector P be- 
comes 


3 -1 -1 -1l 
-1 3 -1 -l 

es 
ee: -1 -l 3 -1l 
—1 -1 -1 3 


The transformed observation equations Ažx2 = b* are 


—2 —9 
>f = 
baci (es 
2 11 


The normals are 10x2 = 40 and the solution is again recovered as x2 = 4. 
The procedure is coded in the M-file elimobs. The reader may find details on error 
calculation in Meissl (1982), Section A.9. 


11.8 Decorrelation and Weight Normalization 


Most computer programs for solving least-squares problems read the observation equations 
by rows and store the relevant contributions at the matching places of the normal equation 
matrix. The ith observation equation from the ith row of A can be immediately stored at 
the correct places in the normal equation matrix ATCA 


(A'CA)ou + 4, Ai Ai SA CA) ae. (11.59) 


This procedure works as long as the individual observations are uncorrelated. If they are 
dependent the procedure is inapplicable. We first have to decorrelate the observations. This 
happens by diagonalizing the C matrix: 


A'=TA (11.60) 
and 
(A’)'A’ = A'TITA=A'CA. (11.61) 


Such a transformation T not only decorrelates the transformed observations but can also 
guarantee these equations have the weight 1. 
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As the covariance matrix C~! is symmetric and positive definite we can use the 
method of Cholesky for factorization: 


y=ct=w'wt (11.62) 
or 
C = WW (11.63) 
By substituting (11.61) we get 
T=W. (11.64) 


This transformation thus described is given as 
A = WA and b =Wb and r = Wr. 


It is easy to show, see below, that x remains unchanged under this special transformation 
and that the a priori covariance matrix of the transformed observations is simplified to 


Ly =. (11.65) 


This decomposition of the covariance matrix XZ, and the transformation of A and b often 
is called decorrelation of the observations. Strictly speaking it is the transformed obser- 
vations which are decorrelated. In practice you need only to calculate W7! for each set of 
correlated observations and then solve the equations 


WTİA'=A and w!b’=b (11.66) 


for A’ and b’ by one forward reduction. Although A contains more correlated rows the 
calculation of A’ continues column-wise. After the decomposition, A’ and b’ are added to 
the normal equations as decribed in (11.59). (The Cholesky factors W~! in the covariance 
matrix of the observations are not to be confused with the usual L-factors for the normal 
equations.) 

The transformation is a change of basis in the column space of T. This column space 
is determined uniquely for each set of correlated observations. Any such linear transfor- 
mation leaves A'C A, A'Cb, and x unchanged. The transformations can be performed 
to secure simultaneously a unit weight matrix for the correlated observation equations as 
demonstrated in (11.65). 

The weighted sum of squares of the residuals r'Cr equals the square sum of the 
transformed residuals 


m 


2 
r'Cr = ` Po 
i=] 


The actual residuals r; are calculated as 


r= Wr. 
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Example 11.14 We give a numerical example demonstrating the theory in all details, and 
use a covariance matrix for double differenced phase observations. We generalize Dg as 
described in (14.29) to r = 3 and s = 4: 


—] 1 O QO l 

—1 0 1 Ọ 1 

-1 0 O0 1 l 
Dg = ; 

—1 1 0 0Ọ0 0 

-1 0 1 0 0 

-1 0 0 1 0 


The covariance matrix Eg = Dalo? DDI = C7! is 


2 
2 
4 
l 
1 


ENF TN A Lb 
BNNI NK eB 


1 
2 
2 
4 
2 


2 


The first step is to factorize C7! into W7! W-T. The result is 


2 0 0 0 0 
J/3 0 0 0 0 
1 2/2 

e 1 A g 0 0 0 

1 0 0 3B 0 0 

l J3 3 3 

> fF 0 8B 4 0 

tf of, A A l 

2 W3 V3 2 2 | 


Notice that W7! has only positive entries like Xy. The second step is to calculate the 
inverse: 


5 0 0 0 0 0 
oe Se 
A Z °” 0 0 0 
es ese A 
wa| 26 26 22 : . j 
malk os 
CO 0 Z 0 0 
l l l 2 
os 0 ig 3 0 
1 1 


A A ls, l la a 
6V2 6V2 2/2 3N M V2 
Note that the diagonal entries of W~! and W are mutually inverse. The zero entries are 


placed at identical places in the two matrices and they share other properties which are 
common to all triangular matrices: The product of two lower (upper) triangular matrices 
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is again lower (upper) triangular, and the inverse of a nonsingular, triangular matrix is also 
triangular. 

After a left multiplication with W of both sides of the original correlated observation 
equations, the independent observations can be read in by the least-squares program like 
any other independent observations. 


All least-squares problems treated so far only have involved one single type of observation: 
leveling, directions, or distances. Each observation has been assigned a weight according 
to the weight relation (9.44). 

Next we shall demonstrate how to combine various types of observations into one 
least-squares problem. The key is to divide the single observation equation by its standard 
deviation. Or in other words to multiply the single observation equation by the square 
root of the pertinent weight. By this the observation equations become dimensionless; 
subsequently they enter a least-squares problem. We say that the observation equations are 
weight normalized: 


WAx = Wb — Wr, (11.67) 
and by this the normal equations are 
ATWTW Ax = ATWT Wb (11.68) 
or 
A'CAx = A'Cb. (11.69) 


They are the correct normal equations with weight matrix C. 

A last remark on weights. If you compare the given description with Example 10.3 
you notice that there is a slight difference. Our distance observations a priori are divided 
by the preliminary value for the distance. This leaves the opportunity to emphasize that 
any observation equation can be multiplied by any constant; but you must understand that 
this changes the weight and consequently the result of the least-squares problem. 
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CONSTRAINTS FOR SINGULAR 
NORMAL EQUATIONS 


12.1 Rank Deficient Normal Equations 


So far we have assumed that A! A is invertible. The columns of A are independent. The 
normal equations have a unique solution. However, in geodetic practice we also encounter 
least-squares problems with dependent columns in A. 

A basic example is a leveling network without any postulated heights. Then Ax 
involves only differences of heights—there is an arbitrary constant in all the heights, which 
cannot be determined. (It is removed when we postulate one height.) The constant height 
vector x = e = (1,1,..., 1) has differences Ax = (0,0,...,0). Then AT Ax = 0 and 
ATA is not invertible. 

The same is true for two- and three-dimensional control networks, for the same rea- 
son. Still we can define a meaningful and unique solution, by fixing one or more heights. 
We shall study both geometrical and statistical properties for these types of solutions. They 
are defined via extensive use of the pseudoinverse matrix A*. 

For these rank deficient matrices A, there does not exist any unbiased linear esti- 
mator £ = Pb. This would require E{x} = PAx = x for all x, or PA = I which is 
impossible. (Ax = 0 would require Zx = 0.) But there do exist linear functions of x which 
allow unbiased estimates (expected value equal to true mean). Basically, we must have no 
component in the direction of the singular vector (1, 1, ..., 1). Our discussion focuses on 
suitable choices of such linear functions and how to interpret them geometrically. 

In general, the purpose of using a least-squares procedure for geodetic networks is 
to determine coordinates of the unknown points. Points with postulated coordinates keep 
these values; the network is “pin-pointed” at those points. 

Although the least-squares estimation of such a network furnishes us with a covari- 
ance matrix for the coordinates of the new points, this covariance matrix is greatly influ- 
enced by the presence and distribution of the postulated points. So the covariance matrix 
tells less about the general features of the network, because it also reflects possible internal 
errors between the known coordinates. 


405 
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In 1962 Meissl proposed a least-squares procedure for geodetic networks that al- 
lowed for singular normal equations combined with certain linear constraints. The idea 
was to consider all network points of equal status. Then points with postulated coordinates 
can be assigned changes to their coordinates too. 

There exists a close connection between the constrained least-squares problem and a 
similarity transformation of the same network. 


12.2 Representations of the Nullspace 


We start by deriving the singular vectors connected to the most common types of geodetic 
observations in case of a rank deficient matrix A. They solve Ax = 0. 


Leveling Network Incidence matrices take differences. The nullspace contains 
€ = e E (12.1) 


Two-Dimensional Control Network A translation of the network in the x direction adds 
a common constant to all x coordinates. If the vector of unknowns is arranged as follows 


X = (X1, Yi, X2, Y2,..., Xn, Yn) (12.2) 
then the nullspace contains x translations and y translations: 
eS (LO Oers h0) and ey = (0,1,0, 1,...,0, 1). (12.3) 


Furthermore the network can be rotated by an angle ø. We shall show in detail how to 
linearize the equations of a differential two-dimensional rotational transformation: 


X; = cosọ X; — sing Y; 
(12.4) 
Y; = sing X; + cos ọ Jj. 


Here (X;, Y;) is a set of given coordinates which the rotation transforms to (X a y ). We 
linearize and keep only the first order terms (sin dy ~ dg and cos dg ~ 1): 


X; +4; = 1 X; — dọ Y; §; = —Y; do 
Y; +ni = dọ Xi + 1Y; ni = Xi dọ. 


This small rotation gives rise to yet another vector in the nullspace of A: 
€p = (—Y1, X1, —Y2, X2,..., —Yn, Xn). (12.5) 
Finally a change in scale dk is described by the vector 


ek = (X1, Y1, X2, Y2, ..., Xn, Yn). (12.6) 
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Our network is determined up to two translations, one rotation, and one change of scale. 
The nullspace N(A) is spanned by the four rows of 


1 0 1 0 1 0 
T 0O | 0 1 O 1 
G = (12.7) 
—Y, xX —Y, Xo —Y, Xn 
xy Yı X% N» Yn Xn 
The differential parameters are df = (dt,, dty, dy, dk) and the transformation is 
dx = (&1, EEEE T Nn) = Gdf. (12.8) 


Three-Dimensional Control Network Three-dimensional networks can be subject to 
three infinitesimal translations dt,,dty, dtz, three infinitesimal rotations døx, dgy, dq, 
and one infinitesimal change of scale dk. The nullvectors are the rows of 


1 0 1 0 0 
0 1 0 0 
0 0 0 0 1 
GT=| 0 -Z /Y, 0 =Z; F, (12.9) 
Zi 0 -xX, Za O —Xn 
Xı Yı Zı Xn Yn Zn 
with 
df = (dt,, dty, dtz, dx, d@y, dpz, dk). (12.10) 
The columns of G (rows of GT) span N(A) and we have 
AG = 0. (12.11) 


We close this section by suggesting a geometrical interpretation. The first three rows 
of G'x = 0 means that the origin is fixed. For numerical reasons we translate the origin 
to the barycenter and provide coordinates relative to the barycenter with an *: 


er er 
i=l i=] i=] 


The next three equations in G'x = g lead to fixed rotations: 


n n n 


OOYE + Xin) =a, (ZN XY) = gs, XO (Zn: + Yřti) = g6- 


i=] i=] i=] 


The final condition X; (X}& + Ym + Zřči) = g7 secures that scale is kept fixed. 
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Table 12.1 Possible G-columns spanning the nullspace of A, see Teunissen (1985a). 
The number of unknowns at each node is denoted u, and d is the dimension of N(A) 


Components of G-column(s) 


Actual 
observation type(s) 


height differences 


distances and 
azimuths 


distances 


distances, azimuths, 
astronomical latitudes 
and longitudes 


distances 


angles and/or distance 
ratios 


12.3 Constraining a Rank Deficient Problem 


When A is rank deficient there are an infinite number of solutions (differing by the solutions 
of Ax = 0) to the singular least-squares problem 


min ||b — Ax lo. 
We look for a unique solution ¥ with the additional constraints 
Gi'x=g. (12.12) 


The rows of the d by m matrix G span the d-dimensional N(A). Then A augmented by the 
d new rows from GT has full rank n. 
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We formulate the problem as one consisting of singular normals A! A with an or- 


thogonal bordering matrix G: 
ATA G]l[x A'b 
‘Gi ollo]=| g | aad 


The augmentation of A! A by GT makes the coefficient matrix invertible. The solution is 
unique, and it can be expressed in terms of the pseudoinverse A+: 


xt = Atb. (12.14) 
Remark 12.1 There is another formulation of the problem. Let the normal equations be 
AT Ax = A'b. 


ATA is still singular and nonnegative definite. In order to make the problem uniquely 
solvable we add a suitable set of d fictitious observation equations 


Fx=g (12.15) 


and suppose these observation equations are weight normalized such that the normals be- 
come 


(ATA + FT F)x = A'b + F'g. 


Such an addition to the problem is called a soft postulation. We shall see how the inverse 
of the coefficient matrix depends on this soft postulation. 

If the fictitious observations are given infinite weight—hard postulation, which im- 
plies that (12.15) is strictly enforced—then those are regarded as condition equations and 
not as observation equations. 

We demonstrate the technique on the transformation (12.37) below and modify it for 
a soft postulation as follows 


Xtransformed = S (Xg + GG')s! (12.16) 
according to Krarup (1979). 


Example 12.1 Let the oriented graph in Figure 12.1 and height differences along the 
edges be given. The incidence matrix corresponding to this graph is 


-1 0 O 1  O 
-1 0 0 90 1 
A= 0 0 -!1 1 0 
0-1 0 0 1 
0 0 0 1 -i 
with the observations 
1.978 
0.732 
b = | 0.988 
0.420 


1.258 
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Figure 12.1 Graph for oriented free leveling network. 


The matrix ATA is singular. A unique solution can be calculated by means of the pseudo- 
inverse matrix A*. This matrix is often determined by means of the SVD for A. This is 
the decomposition A = UXV! and from here At = VEtUT. The unique least squares 
solution of minimum norm is 


—0.802 4 

—0.4944 

xt = Atb = | 0.1916 
1.1796 

—0.0744 


The same solution = x+ can be achieved by augmenting ATA by the row e! and the 


column e of all ones: 
ATA ellx = Alb 
e ollo] | o |’ 


Note that the mean value of ê = Xo is zero. This gives the last equation eT£ = 0. To find 
the solution we did not form the normal equations but rather used the SVD. 

If we want to change the solution £o from “0-level” to a level such as / = 20, we 
simply have to change g to 5 x 20 = 100 and we obtain the solution 


19.1976 
19.505 6 
X29 = | 20.1916 
21.1796 
19.925 6 


The mean value of this solution is 20 because e! £29 = 100. 
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Example 12.2 We want to demonstrate how a projector can bring £20 from Example 12.1 
back to Xo. In this special case we project onto the plane perpendicular to e: 


4 -1 -1 -1 -1 

—-1 4 -1 -1l -1 

P=1-—e(e'e)'e! = I — tee! = ie Sl Ae ed 
—-1 -1 -1 4 -l 

-1 -1 -1 -l1 4 


or 
19.1976 —0.802 4 
19.505 6 —0.494 4 
Xo = P£ = P | 20.1916 | = 0.1916 
21.1796 1.1796 
19.9256 —0.0744 


Example 12.3 Continuing Example 12.1 we shall calculate the covariance matrix for the 
pseudoinverse solution. We shall determine the standard deviation (of unit weight) 


f =b -— Axt =b — AAtb = (I — AAŤt)b and Go = |lF|lz = 0.0069 = 6.9mm. 
Thus the covariance matrix X, = Sg AT(AT)E is 


0.192 —0.096 —0.096 —0.000 —0.000 
—0.096 0.416 —0.224 —0.128 0.032 
—0.096 —0.224 0.416 0.032 —0.128 | x 107+ 
—0.000 —0.128 0.032 0.128 —0.032 
—0.000 0.032 —0.128 —0.032 0.128 


with tr(X4) = 1.280 x 1074. Next we have 


sal Clea) 


0.211 —0.077 —0.077 0.019 0.019 
—0.077 0.435 —0.205 —0.109 0.051 

—0.077 —0.205 0.435 0.051 —0.109 | x 107+. 
0.019 —0.109 0.051 0.147 —0.013 

0.019 0.051 —0.109 —0.013 0.147 


and & = 6¢(B'B)~! becomes 


Finally tr(£) = 1.376 x 1074. Obviously tr(X4.) < tr(Z). 


Example 12.4 This is the example of a free network. We augment the original n = 2 
unknowns to include the unknowns of all other points in the network. Hence the number 
of unknowns increases to 8. As rank(A) = 8 — 3 = 5 we have to include at least two 
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Figure 12.2 Free distance network with 6 observations. The confidence ellipses corre- 
spond to the postulated coordinates X p, Yp, X1, and Y}. 


more observations compared to Example 10.1. For reasons of symmetry we include even 
three more observations, viz. distance observations between points 001-002, 002-003, and 
003-001. The vector of unknowns is 


x = (XP, YP, X001, Y001, X002, Y002, X003, 003); 


the augmented coefficient matrix is 


—] 0 1 0 0 0 0 0 
0.707 0.707 0 0 —0.707 —0.707 0 0 
Ja 0.707 —0.707 0 0 0 0 —0.707 0.707 
| 0 0 0.924 0.383 —0.924 —0.383 0 0 
0 0 0 0 0 —1 0 
0 0 0.924 —0.383 0 0 —0.924 0.383 
We choose a nullspace of dimension 4: 
1 0 1 0 l 0 1 0 
GT = 0 1 0 1 0 1 0 1 
= | —170.71 170.71 —170.71 270.71 —100 100 —241.42 100 
170.71 170.71 270.71 170.71 100 100 100 241.42 
and the right side is 
0.010 
0.020 
b= 0.030 
0.010 
0.020 


0.030 


12.4 Linear Transformation of Random Variables 413 


The pseudoinverse solution xt and the residuals r are 


0.008 0 
0.002 3 0.006 7 
0.0113 0.004 7 
oa —0.006 8 ana "EE 0.004 7 
—0.002 6 —0.003 6 
—0.008 7 —0.002 0 
—0.0167 —0.003 6 

0.013 2 


The standard deviation is d9 = 10.9mm. The least-squares solution of the free network 
yields the following coordinates (X;, Y/): 


P 170.71 0.008 170.718 | 170.71 0.002 170.712 


1 270.71 0.011 270.721 | 170.71 —0.007 170.703 
2 100.00 —0.003 99.997 | 100.00 —0.009 99.991 
3 100.00 —0.017 99.983 | 241.42 0.013 241.433 


Note that $` &; = ` n; = 0 within the computational accuracy. 


12.4 Linear Transformation of Random Variables 


In the case where A was singular we circumvented this problem by introducing the solution 
vector of shortest length: x+. That was the effect of augmenting A! A and producing the 
pseudoinverse. 

We shall try to analyze this situation from a random point of view. The nonunique 
solution was determined up to any additional solution from the nullspace. Any nonunique 
solution vector £ is connected to a covariance matrix Le. But as the vectors £ differ, so 
do the covariance matrices Xs. Among all these possible covariance matrices we shall 
demonstrate that the covariance matrix corresponding to the pseudoinverse solution is the 
one with smallest trace. In statistical terms this means that the pseudoinverse solution xt 
gives the smallest overall variances of the unknowns. 


Let v = B u be given as a linear transformation between some random 
m by 1 mbyn nbyl 


variables u and v, and if E{u} = 0 then E{uu'} = dy. 
Now we want to approximate u by Av and substitute 


u = A v + w 
n by 1 nbym mby1l1 nbyl 


where w is some residual vector. We solve for w 


w =u — Av = (I — AB)u. (12.17) 
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The covariance matrix X, for w is 
E, = (I — ABS EB) (12.18) 

We want to minimize the trace of &,,—B being the variable matrix—and shall prove that 

troy = min for B=(A'A)'A!. (12.19) 
The results are 

troy =try, —tr(AlA)!ATE,A 
and 
Ew = (I — A(ATA)“1AT)E, (1 — A(ATA)“1 AT)? 
and 
E, = (ATA) IAT ELA(AT A) !. 


This theorem shows a result which is different from that following from the usual 
least-squares estimation of v = Bu. Ordinary least-squares estimation, minimizing the 
square sum of weighted residuals, yields the result 


BSA D AAE where rank Ly, =n. 


In the present theorem we minimize the sum of variances of residuals w; &, may have any 
rank. When &, = Z there is no difference between the two cases. 


Proof We start from (12.18) and get 
Ly = (Ly AB =B AaB AABAA A. 
The first term is independent of B, so we get 
tr Dy» = const. — tr(£ BT A!) — tr(ABZ,) + tr(ABE BT AD) 


and 
0 tr Ly 
dB 


or B = (A! A)~!A! which proves the theorem. 


= —2A'y, + 2A'ABE, =0 


12.5 Similarity Transformations 


In the previous section we suspended the rank defect of ATA by a bordering matrix G. 
At the same time we added constraints on the least-squares problem. These constraints 
were also described by G! However, the constraints may be of a more general form. In this 
section we shall introduce constraints which transform the free network to one with at least 
d postulated coordinates and we shall derive the pertinent formulas. The related covariance 
matrix Utransformed 1S correspondingly transformed, and consequently d columns and rows 
will be zeroed. We start by working through a simple example. 
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P3 


P; Pz 
Figure 12.3 Simple free leveling network 


Example 12.5 A simple leveling network has three nodes Pı, P2, and P3 and observed 
oriented differences of heights h12, h23, and h31. According to the loop law we have 


hi2 + h23 + h31 = 0. (12.20) 


We cannot calculate the heights of the nodes from these differences. So we choose P; as 
reference point, i.e. we put Pı = given constant = h,. The best estimate for hz and h3 is 
now unique 


hi =h +h =h! +h, i=1,2,3 (12.21) 


where the upper index refers to the reference point chosen. Yet there is a slight flaw because 
of the arbitrariness in which we are choosing P; as reference. In fact we can take any point 
of the network as reference point. We can even choose the barycenter Pp and assign it the 
height 


n 
hp =) hi. 
i=l 
Consequently we get 
1 _ pb 
hi = 3(hui + hzi + h3i) + hb = h? + ho. 


All sets of heights like h! or h? can be looked upon as valid heights. So it is more a 
question of what set to choose. However, the statistical properties of the heights h : or h? 
depend highly on the choice of reference point. From E {h}} = E{h\2 +h23} and E (h$ = 
E{ hi + h23 + xh3) } follows for instance that E{h3} = E{h3}. Their covariances differ, 
too. 

When comparing two sets of heights it is essential that they refer to the same level of 
reference. So in order to do this we shall learn how to transform heights from one system 
to another. We write (12.21) explicitly 


hy hj 1/0 O0 hı 
h |=|h |+| 1/10 0f] (12.22) 
h3 h! ılo oj} L# 
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Or 
1 
h! 0] O0 0f}, 
h} |=| -1]}1 0 |] a2]. (12.23) 
h} -1 | 0 1 JL% 


Equation (12.23) shows us how we can transform from one heighť system, say h 7 to the 
height system defined by P; as reference point. In a similar way we obtain 


hi 2 -1 a 
hè | =i|-1 2 -1]] a! 
ne Shah 2) uy 
and in general 
h? met -1 -1 fh} 
is a a ~ i (12.24) 
h? i | —_ h! 


This square matrix is already seen for n = 5 in Example 12.2. 


The 2-dimensional transformation model must allow for two translations tx, ty, a 
rotation g and a change of scale k. The transformation from one system (x, y) (in geodesy 
often termed the “from” system) to another (£, 7) (the “to” system) is given by 


SLs Sif] e» 
Ni — sing cosg | | yi ty 


Equation (12.25) is not linear in ¢, but by a trick we may make it linear. We introduce new 
slack variables a = k cos ọ, and b = k sing and get 


SEG} es 


The observation equations resulting from p common points are 


x yw 1 O §1 re, 
x2 -y 1 0 £2 FE, 
; : . i 
Xp Xp l 0 b| |&p _ | 7s, 
yy —x1 0 l ty B n1 ln 
y2 2 0 j! ty 2 Vy 
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or symbolically 
A f= b - r. 
2pby4 4byl 2pby1 2pbyl 
This linear least-squares problem does not show any difficulties. 
Yet, we cannot resist from demonstrating the classical solution procedure. We set all 
weights to unity and the normal equations ATA f = A'b can be written 


d(x? + y?) 0 xX = Dy || 4 $ (ixi + niyi) 
0 Da? +yD Ey -xi S (Ei vi — Nixi) 

> xi » yi p 0 as 

> i AN 0 p ty > Ni 


S~ 


= 


(12.27) 


All summations run from 1 to p. For numerical reasons we reduce all coordinates in both 
systems to their respective barycenters (xs, Ys) and (Es, ns). This leads to ` x; = ` yj = 
Yé = Ñ` ni = 0 and ATA becomes diagonal. We introduce coordinates relative to the 
barycenter x* = x; — Xs, yj = yi — ys, and xs = ) xi/p, Ys = DL yi/D: 

Now the inversion reduces to solving individual equations: 


* 0 0 O a * 
0x00 b| |> 
0 0 p O t| | 0 
00 0 plILh 0 
Nonzero entries are marked by x. The explicit solution can be written 
(xf? +y) Lia? +y) 


Next, we can determine 


k=Ja2+62 and ¢=arctand/a. (12.29) 


Finally we get f and fy from (12.27): 


~  &-a4d x - bY y 


ip = OE = Ey — ix, — bys, (12.30) 
Pp 
A ;—â j b j A N 
ty = RU Santen = ns — ay, + bxs. (12.31) 
The covariance matrix for the estimated f 1S 
l 
df SO Dae, LAD Re ND 
P (xj + y7) — (xu) — yi) 
p 0 — DXi =i 
0 p — Di 2 xi 


-5x -Ey Yi? +y7) 0 
— vy Dx 0 d(x? + y?) 
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The variance of unit weight which also is the variance of a transformed coordinate is ac- 
cording to (9.64) 


org +H) 


a? a 
9 2p—4 
For the translations we have 
2 2 2 2 
2 a A d(x; +y) — î^ 1 Xs +y; 
Op. 700 2 9) 2 I= PT So *2) 
pa +y- (oxi) - Oly) oe 
(12.32) 


Observe that the standard deviations for f, and t, not only depend on p, as expected, but 
also on the position of the origin. 

Thus we have described the most important, elementary circumstances about the 
similarity transformation. It includes p points whose coordinates are given in two systems, 
namely an original x, y system and a different £, n system. The transformation parameters 
k, @, t,, and f, are estimated through a least-squares procedure. Additionally, the solution 
makes it possible to transform any other point (x;, y;) into the corresponding (&;, nj) by 
means of the transformation equations 


= eer ge 
nj = —bx; + ay; + ly. 

Example 12.6 We shall demonstrate the procedure just described by a numerical example. 
Suppose we know the exact coordinates (N, E) = (E, n) and (x, y) of 7 common points. 
We want to determine the transformation parameters and to transform a single point with 
given x, y coordinates. The procedure is in fact a 2-dimensional interpolation. 

For a Start we list the postulated coordinates. Point s is the barycenter (its coordinates 
are in the last line). The second table gives the same data reduced to the barycenter. The 
latter are marked x. 


62-04-005 | 277722.022 —230 855.152 | 6310000.527 562940.820 
62-04-801 | 275956.869 —231 105.839 | 6308 231.260 562 725.625 
62-04-810 | 277 563.374 —235447.400 | 6309749.964 558354.121 


62-04-811 | 278 608.525 —233 945.915 | 6310824.656 559 833.890 
62-04-815 | 276163.682 —236471.626 | 6308330.475 557358.463 
63-01-002 | 273 578.801 —230941.425 | 6305857.705 562 937.589 
63-04-003 | 274 533.958 —235063.723 | 6306729.799 558798.283 

S 276 303.890 —233 404.440 | 6308 532.055 560 421.256 


62-04-005 
62-04-801 
62-04-810 


62-04-811 
62-04-815 
63-01-002 
63-04-003 


1 418.132 
—347.021 
1 259.484 
2 304.635 
— 140.208 
—2 725.089 
—1769.932 
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2 549.288 
2 298.601 
—2 042.960 
—541.475 
—3 067.186 
2 463.015 
—1 659.283 


1 468.472 
—300.795 
1 217.909 
2 292.601 
—201.580 
—2 674.350 
—1 802.256 


2 519.564 
2 304.369 
—2 067.135 
—587.366 
—3 062.793 
2516.333 
—1 622.973 
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Sum .001 .000 .001 .001 


The necessary sums of product are 


X x}? = 19607 592.00 X y}? = 34476609.52 


\ xn? = —4 739 029.25 vig" = —3 655 603.21 


X x7éř = 19510390.19 $ yžnř = 34545 930.52 


i 


999 484 49 b= 0.02003221 


0. 
k= 0.999685 22 = 1.275777 gon 


i, = £ — x, — bys = 6037 046.208 
ty = ns + bxs — Gys = 799 240.351. 


The transformation equations (12.33) from (x;, yj) to (€, nj) include a rotation and trans- 
lation: 


E; = 0.999 484 49 x; + 0.020 032 21 y; + 6037 046.208 
nj = —0.020 032 21 x; + 0.999 484 49 y; + 799 240.351. 


A point with coordinates (x, y) = (276 109.847, —233 507.185) is transformed to (€, n) = 
(6 308 336.054, 560 322.451). The official values are (6 308 336.054, 560 322.449). Evi- 
dently, the accuracy is satisfactory. This is partly due to the fact that the transformed point 
is close to the barycenter. By means of the M-file simil we find G9 = 3 mm. 

For the sake of completeness we shall add that this procedure is called a Helmert 
transformation. A similar least-squares problem was mentioned by Helmert (1893). 


Symmetric Similarity Transformation The thoughtful reader might ask why we leave 
(xi, yi) of the common points unchanged under the transformation. The formulation favors 
one point set above the other: Are the postulated coordinates (x;, y;) much better than 
(Ei, ni) so that they can be assumed free of errors? Actually those (xi, yi) are transformed 
as a “stiff? point set while the points (&;, n;) adjust individually under the procedure. So 
why not allow both point sets to adjust under the transformation? In practice the (x;, yi) 
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coordinates are worse than the newly calculated (&;, n;). So the relevant formulation is to 
augment (12.25) by two extra types of observation equations: 


Ši a b P 
Ni —b aj|x; ty 

= 12.34 
x! 1 0 HE 0 CE) 
Y’ 0 1 0 


where (&;, ni) and (X 3 Y; ) denote the postulated coordinates of the common points in the 
two systems. But a noteworthy change also has happened: the p sets of (x;, y;) have been 
introduced as additional unknowns which shall be estimated together with the earlier 4 
unknowns: a, b, tx, ty. 

We only quote the results given in Teunissen (1985b), 141-146. The scale is given 
by a rather complicated expression k = à + V1 +22 where 


> (E*? ai nv?) — Six + Y2) 


Em 
2 (EEx AYY Sa A oi 
Xo e Tel r aea aee 
LEY, * — nX” 
@ = arctan St 
DEX + nY, (12.35) 


i, = xs — X k cos ô — Y,k sing 
ty =y,+ X,k sing — Y,k cos ĝ 


Kj = Xs; +—s ag (xr +8 k cos ĝ - ntksin @) 


şi = Y; + — (¥;* + §ksing + nk cos). 


ni 
A comparison of k = à + v1 + A? with the expression k = v â? + b2, where â and b are 
given by (12.28), shows that the latter expression systematically underestimates the scale 
compared to the value coming from (12.35). 

When both sets of coordinates of the common points are adjusted the model is no 
longer linear and we get a biased estimate for k. The estimates for the rotation ĝ and the 
translations 7, and ty are still unbiased. The bias for k can be shown to be 


ke a2 
E e (12.36) 


where the common points are placed in a square lattice with side length d. The expression 
(12.36) shows that the bias is negligible for most practical problems. If ô&o/d = 1075, 
k = 1, and n = 4 we have bj = 0.5 x 107. 

The example is interesting because even a simple nonlinear least-squares problem 
may introduce biased estimates for the unknowns. 


12.6 Covariance Transformations 421 


12.6 Covariance Transformations 


We repeat that any nonsingular least-squares problem results in estimated coordinates x 
and an a posteriori covariance matrix £+. This result implies the following two options: 


— The coordinate system is defined by postulating four quantities. They can be: (1) All 
four coordinates of two points. (2) Two coordinates of one point, one orientation, and 
one distance. (3) Four linear functions of the coordinates. This coordinate system is 
the reference in which all other coordinates are calculated. The transition from one 
coordinate system to another is accomplished by a similarity transformation. 


— The variances of the four fixed quantities are set to zero to serve as reference for 
confidence ellipses at all other points. The transition of one covariance system to 
another may be performed in two ways: 


1 Indirectly by repeating the above procedure with newly selected reference co- 
variances. In many cases it is not of interest to transform the coordinate system 
itself, leaving the coordinate values unchanged. 


2 Directly by selecting a member of the family of covariance transformations 
transformed = S dig s: (12.37) 
having one covariance matrix available. 


Some of those choices are of a subjective nature and follow from practical circumstances. 

The matrix S can be derived from simple geometric considerations. As usual the 
least squares model is Ax = b. The components of x do not necessarily represent coordi- 
nates. But what we like to find are unbiased estimable linear functions of x which can be 
interpreted as coordinates. We denote them by x*. Unbiased estimability implies that x* 
is the expected value of a linear function of b: x* = E{Bb} or 


x* = BAx. (12.38) 


In Example 12.5 we already encountered a sample of x* = E{ Bb}, namely the heights h? ; 
hb, and h? which are linear functions of the observed height differences. 

Now we concentrate on two dimensional networks. The coordinates x* should be 
transformable to another coordinate system x by adding the contribution from a differential 
similarity transformation (12.8): 


x=x*+Gf. (12.39) 
This is rewritten f = G*+(x —x*) and substituted into (12.39): x = x* ++ GG* (x —x*) or 
(I — GG*)x = (I — GG*)x* (12.40) 


which is a consistency condition on the transformation. As G has full rank we have GT = 
(GTG) IGT. So the projector S is defined as 


S=1—G(G'G)'Ge!. (12.41) 
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Geometrically S projects onto a subspace R(.S) complementary to N(A) and along the 
nullspace N(A) = R(G). The matrix to invert is of small dimension equal to the dimension 
of N(A). 

If we want to achieve minimum trace of £+ for p common points of the network, 
2 < p < 5, we split the G matrix into two: G1 containing the rows pertaining to the p 
selected common points and G2 containing the rest: 


d 
G= = | Ta 
G2 | n—-2p 
We assume that the coordinates for the common points are on the top of x (the rows of G 
may have to be interchanged). In this more general case we shall calculate S = I — GG+Ħ, 
i.e. we must find an adequate expression for GT. 


We postulate that Gt = | GË 0] and shall now prove this is correct. We start by 
the identity 


G G GiGt O||G GiG{G 
cote =| le oj} “laf 73 Weer a) 
G2 G2 G2G} 0|| G2 G2GÏ G1 
According to the definition of a pseudoinverse we have GiG{G = G1; and G2 GiGi = 
G2(G1G1)~'GIG = Go. Therefore 


G1 
o 


Thus Gt = [GT 0] = [(G]G1) IGT 0] is the correct pseudoinverse. 
The transformation matrix for p common points with minimum variance is 
G1(G1G,)"'GT A 


G] 
sp =1-G6* =1-| | (GIG) IGT 0ļ=1-— 
, LG oo G2(G{Gi)"'GT 0 


G2 


Still we only have to invert a d by d matrix. For 2p = n we recover (12.41). 
A special and important case is p = 2. We get d zero variances: 


I 0 0 0 
Sp =I- = | 12.42 
i E- o aa i ae 


Example 12.7 We continue Example 12.4 and want to transform the adjusted network so 
that the coordinates Xp, Yp, X1, and Y; are kept fixed. This is achieved by 


1 0 —Yp 

= 0 1 XP 
CLS la G er 
0 1 XxX] 
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Hence 
0 0 0 0 0 0 0 
0 0 0 0 0 0 O 
0 0 0 0 0 0 O 
0 0 0 0 O 
= | eee 
—1.707 1 O 0Ọ0 O 
—0.707 0 1 0 0 
—1.707 0 O 1 O 
0.707 0 0 O 1 
The transformed solution is 
0 
0 
0 
P r 0 
Yp = 3P = | 0.0019 
—0.0150 
—0.028 8 
0.002 2 
The covariance matrix for the transformed solution is 
0 0 0 0 QO 0 0 0 
0 0 0 0 O 0 0 0 
0 0 0 0 O 0 0 0 
5. — 0 0 0 0 O 0 0 0 
?P | 0 0 0 0O 0.0046 —0.0030 —0.0063 —0.0039 
0 0 0 O —0.0030 0.0061 0.0040 0.0049 
0 0 0 O —0.0063 0.0039 0.0046 0.0030 
O 0 0 0 —0.0039 0.0049 0.0030 0.0061 
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The confidence ellipses are shown in Figure 12.2. Remember the units of £ zp are m7. The 


confidence ellipses at points 002 and 003 in the figure are a tenth of true size. 


12.7 Variances at Control Points 


Repeatedly we have used the term “postulated coordinate.” In daily terms this is a “given 
value” or a “fixed value.” Even if these values are treated computationally as real numbers, 
most often they are created through a previous least-squares procedure and consequently 


are born with nonzero variances. 


For practical reasons (or lack of knowledge) the covariances are often set to zero. 
Today this attitude to the problem is changing. We shall demonstrate a model that correctly 
treats nonzero covariances of postulated coordinates. An excellent reference is Schwarz 


(1994), and we shall benefit from it. 
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Frequently, the estimated observations are more accurate than the observed values. 
The coordinates estimated from a constrained network are more accurate than those esti- 
mated from a free network. We shall investigate the statistical implications of this. 

We start by partitioning the usual observation equation Ax = b — r into 


Al bi 
A2 |x= | b |-r. i (12.43) 
A3 b3 


The observations described by A, are usual geodetic observations as treated in Sec- 
tion 10.2. The matrix Az contains the necessary rows to span the nullspace of A, cf. Sec- 
tion 12.2. Finally A3 contains coordinate observations as described in Section 10.2. The 
observation vector b is partitioned conformally, and the covariance matrices are denoted 
21, 42, and E3. 

In case a least-squares problem involves “postulated values” with zero variance we 
have X2 = 43 = 0. 

When the third group of observations A3 is missing, we are left with a free problem 
indicated by a subscript f. The estimate for x is called ¥¢ with covariance matrix Lr. If we 
subsequently add the third set of observations A3 we have to constrain the solution xf with 
the latter equation: A3x = b3 — r3. For reasons of curiosity we shall carry the derivation 
through in all detail. 

Let the first two observation types be combined and the problem is now formulated as 


alsa- aay 


We introduce Lagrange multipliers A and A3 and define the extended function 
Lir,r3,X,A,A3) = sri! + iri D373 +A (Ax —b+r)+ A1 (A3x — b3 + r3). 


The partial derivatives of L with respect to the variables are put equal to zero: 


oe => 'r+i1=0 (12.45) 
ah = D3 'r3 +43 =0 (12.46) 
ab = AT, 4 AThy <0 (12.47) 
ae = A3x — b3 + r3 = 0. (12.49) 


The residuals are obtained from (12.45) and (12.47) 


r=-DA (12.50) 
r3 = —L3A3. (12.51) 


Combining (12.50) and (12.48) yields 


Ax —b-—XYA=0 
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or 
A= E7! (Ax — b). (12.52) 
From (12.47), and (12.49), and furthermore from (12.49) and (12.51) we get 


ATE! (Ax — b) + Aà = 0 (12.53) 
A3x — b3 — &3A3 = 0. (12.54) 
In matrix form this is 
Ty-l T Ty-1 
ATE“1A AP ][ x] _ [ATE 1b] oe 
A3 —%3 | | A3 b3 


This equation shows how the normal equation matrix of the first group must be augmented 
in order to find the solution of both groups together. Of course, it is possible to solve for x 
and 43, but we prefer a recursive solution for the parameters. (In so doing we in fact derive 
the Kalman gain matrix and prediction of the covariance matrix in filtering, developed in 
Chapter 17.) 

Next we need to know the inverse of a 2 by 2 block matrix. Let 


A B 
=| a c] 


where A and C are square but B can be rectangular. The inverse block matrix is 


-1 F ad A-'4+A~! B(C—BTA~!B)~!BTA~! —A-!B(C—BTA-'B)~! 
~ | =(C-BTA-! B)! BTA! (C-BTA-!B)-! 


Ro, R22 


This is easily verified by direct multiplication of A with A~!. The matrix R is introduced 
for reasons of reference. 
Hence the solution for x can be written 


x = R1 AT ET!b + Ribs (12.56) 
à3 = Ro AT ET !b + Robs. (12.57) 


We seek a recursive solution so we introduce Xf = (A'd-!A4)-'!AlxY—'b and Lr = 
(ATET!A)! for the solution of the first equation in (12.44). After lengthy calculations 
we get the (updated) constrained estimate 


ĉe = êp + DpAT(D3 + A3 Es Al) | (b3 — Aak/). (12.58) 
The covariance matrix of the constrained estimate X, is recognized as 
Ee = Ep — EAT (£3 + A3 Es AT) A3 Ep. (12.59) 
A nonnegative definite matrix is subtracted from Xp, so 


2e < Èr. (12.60) 


This means that the variance of any scalar function of X, is less than or equal to the variance 
of the same function evaluated for X,. 
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Intuitively, adding the new information b3 to an already existing set of observations 
cannot worsen things, and most often it improves things. 

Until now a constrained network seems as good as or even better than a free network. 
This is only true if the least-squares procedure uses the correct covariance matrix. This 
assumption does not always hold when we fix the control points. These points are almost 
never known perfectly. So next we want to study the effect of fixing control points to values 
that are not optimal. . 

We start with the following model: 


AEEA ee 


> 
kK 5° | (12.62) 


with covariance matrix 


The natural way to proceed would be a least-squares procedure for the whole problem. This 
is most often not done because it changes the coordinates of the control points. Besides the 
adjustment task often is so comprehensive that in daily life it never will be done. 

So more often r° is set to zero, and we substitute x? = b? into (12.61) to obtain 


A1X, = b — Ab? =r. (12.63) 
This equation is to be compared to (8.39): A2x = b — Ax? — r. The solution of (12.63) is 
ĉn = (ATETA) ATE! — A2b0). (12.64) 


This «stimate is influenced by two error sources: the observational errors in b and the errors 
in the postulated coordinates b°. They are described by the covariance matrices © and X°. 

We want to find the covariance matrix for the estimate x,. We know the covariance 
matrices of b and b? so we calculate the partial derivatives 


OXn OXn 
0b = ap? 


ss —1 — 2 —1 = 
| = | (ATS 'Ay) ATE! -—(ATE!A,) ATE taz]; 
Now apply the law of covariance propagation and get 


7 La E eee 1 B xu 0 
bs, = [afata aa ajaa aaa] o 


2-lA (ATEA) 
x 
ATSA AAD A)" 
= (ATETA) + (ATEA) ATETA PATET A (ATETA). 


(12.65) 
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The first term specifies the contribution from the observations b. This might be called 
the internal error. The second term describes the external error coming from the control 
points. So we can write (12.65) as 


Le, = internal + Yexternal- (12.66) 


This equation explains statistically why control networks sometimes become inadequate. 
Usually the control network is supposed to be much more accurate than the new densifica- 
tion network. This means that £? should be so small compared to E that the second term in 
(12.65) is much smaller than the first term. As long as this is the case Lz = (Arar: A,)7! 
can be used as a reasonable approximation of (12.65). 

So far this fits the traditional manner of evaluating control networks. It is supposed 
that the accuracy of the control network is better than that of the new observations. Then 
we can use (12.65) with only the first term on the right side. If the accuracy of the new 
observations approaches or exceeds that of the existing control points it is essential to 
including both terms from (12.65). . 

We can also determine the effect on the estimated observations when the control 
points are held fixed. We have 


b = AiR, + A2b? = A (ATETA) ATE !b + (I — A (ATETA DT ATE !)A2b.. 
(12.67) 


Again the covariance matrix consists of two terms: 
E; = Aı(ATET!AI) AT + (I — A (ATETA DIATE!) 
x A22°AJ(I — A (ATET!A T ATE)". (12.68) 
If the second term vanishes we are left with the usual expression 
E; = A (ATETA) AT. (12.69) 


Now the difference between the covariance matrix of the actual observations and that of 
the estimated observations is 


as =f 
Dp — Up = Ep — A1(A EAI) AT 


= (I — Aı(ATET!A T ATETA? AT(I — A (ATETA IATE!) 
(12.70) 
This is a nonnegative definite matrix, and we have 
dig < Èp. (12.71) 


That is, the estimated observations b have a smaller variance than the actual observations. 

If the second term in (12.65) does not vanish, equation (12.71) does not necessarily 
hold. Nowadays it often happens that the estimated observations have larger variances 
than the actual observations. In other words, if we fix the control points we might cause 
the adjusted values of the observations to be worse than the actually observed values. This 
argument applies especially when we try to fit GPS vectors into existing control networks. 
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Equation (12.65) also holds for a free network where we only add the necessary 
conditions to define the coordinate system. The usual least squares leads to the estimate 
Lo (ATE! A,)~!. This covariance matrix tells us how well the coordinates of the new 
points are estimated, but not how well they are known relative to the control points. The 
second term in (12.65) accounts for the uncertainty of the fixed control. 

For a free network the columns of A» are linear combinations of the columns of Aj, 
say A2 = A,H for some matrix H. Then i 


(ATETA) ATE“ Ad = (ATE AY) ATE! AVA =H. 
Equation (12.65) becomes 
Sy, = (AE AN) HEH (12.72) 
Even more interesting, we then have 


(I — A(ATE TAD ATE") Aa = (A1 H — A1(AT E'A ATETA H)=0 
(12.73) 


so that the second term in (12.68) vanishes. This means that &; < X, holds for all free 
networks, irrespective of the uncertainty of the fixed control. The coordinates obtained 
from a least-squares solution of a free network may be affected by the errors in the fixed 
control, but the adjusted observations are not. This is the sense in which these least- 
squares solutions are “free.” 


Example 12.8 This theory is now applied to a leveling network. We only need to know 
the covariance matrix for the postulated heights 


0.0100 0.0075 0.0075 
Z’ = | 0.0075 0.0100 0.0075 
0.0075 0.0075 0.0100 


We recall the data from Example 8.1: 


1.978 
0.732 10.021 

b = | 0.988 and b? = | 10.321 
0.420 11.002 
1.258 


The inverse covariance matrix (o9 = 1) for the observations b is 


0.980 4 
1.0309 
sols 0.9009 
0.9346 
1.1236 
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The observation equations (12.63) are Ax, = b — Ab? — r: 


1 0 1.978 —1 0 0 

0 ieg 0.732 —1 0 0| 710.021 

1 0 Pac 0.988 |-| 0 0 —1] | 10.321 | —r. 
0 1 i 0.420 0 —1 0] | 11.002 

1 -1 1.258 0 0 0 


The solution is 


A - = = H 11.9976 
ên = (AEWA) ATE Mb Aad) =| |= fioa 


and the covariance matrix is 


bow 5 0.008 7 
Ly = 


0.217 0.079 < 1074 0.0074 0.008 0 
0.0080 0.008 4 l 


0.079 0.211 0.0080 0.008 4 


We recognize that the last term—the variance of the postulated heights—by far dominates 
the result. The variances of the new points D and E are much larger than the variances 
of the observations themselves (computed in Example 8.1). The two heights Hp and Hg 
are strongly correlated, too, since they share the uncertainties of the control points A, B, 
and C. 


According to (12.68) the covariance matrix of the estimated observations is 


—0.0001 0.0004 0.0009 —0.0011 —0.000 6 
0.0004 0.0008 0.0000 —0.0013 —0.000 4 

up = | 0.0009 0.0000 —0.0020 0.0010 0.0009 
—0.001 1 —0.0013 0.0010 0.0017 0.0002 
—0.0006 —0.0004 0.0009 0.0002 —0.0002 


0.217 0.079 0.217 0.079 0.138 
0.079 0.211 0.079 0.211 —0.132 

= | 0.217 0.079 0.217 0.079 0.138} x 1074 
0.079 0.211 0.079 0.211 —0.132 
0.138 —0.132 0.138 —0.132 0.269 


—0.0002 0.0004 0.0009 —0.0011 —0.000 6 
0.0004 0.0008 0.0000 —0.0013 —0.0004 

+} 0.0009 0.0000 —0.0020 0.0010 0.0009 
—0.0011 —0.0013 0.0010 0.0016 0.0002 
—0.0006 —0.0004 0.0009 0.0002 —0.0002 


The uncertainty of the control points A, B, and C—reflected in the second term—domi- 
nates this covariance matrix. Recall that the covariance matrix of the actual observations 
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1S 


D= = 0.62 x 1074. 


0.50 


We observe that the second term causes the covariance matrix of the estimated observa- 
tions Xi; to be larger than the covariance matrix of the actual observations dp. 


13 


PROBLEMS WITH EXPLICIT 
SOLUTIONS 


13.1 Free Stationing as a Similarity Transformation 


Any free stationing can be perceived and calculated as a usual least-squares problem where 
the observations are distances and horizontal directions. 

Normally, the distance and direction observations are taken in pairs. Yet, this is not 
required for carrying out the procedure. 

Alternatively, the situation can be conceived in the following way: Introduce a local 
coordinate system x, y with origin at the free stationing point P and x-axis through the first 
object in the horizontal direction set. Any observation of distance and direction from P to 
a known point can be looked upon as a polar determination (6, s) of this point in the x, y 
coordinate system. The task is now to transform the locally observed x, y coordinates for 
the given points into € , n coordinates ina global system. The translational parameters tx, ty 
of the transformation, and the rotation g of the x, y system relative to the £, 7 system are 
the 3 unknowns. 

Futhermore, we may introduce a change of scale k between the two coordinate sys- 
tems. Then the similarity transformation works on unknowns tx, ty, œ, and k. The coordi- 
nates for P are (ftx, ty) and the orientation unknown is @. 

We shall illustrate the theory by a calculational example, using the observations and 
coordinates presented in Table 13.1. We reduce the coordinates by x;* = x; — xs,..., and 
get the values in Table 13.2. 

In Section 12.5 the classical formulas for the least-squares solution of the problem 
were described. According to this, the first step is to calculate product sums: 


Sa}? + yf?) = 18 219.017 32 
> érxř = 88.4855 YO nixf = —13 575.232 19 


\étyF = 3681.640980 So nžyř = -5926.219065 
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Table 13.1 Observations and given coordinates for free stationing 


Observed | reduced | horizontal | Cartesian coordinates given coordinates 
Station | reading 6; | distance s; Xi yi Ei Ni 
[gon] [m] [m] [m] [m] [m] 


0.000 51.086 51.086 0.000 60.10 —926.73 
107.548 50.771 —6.006 50.415 | 126.16 —888.81 
191.521 65.249 | —64.671 8.664 | 105.38 —819.88 
231.709 | 110.362 | —96.953 —52.725 57.60 —769.61 


Next the unknowns are given by simpler formulas which emerge because of the translation 

to the barycenter: 

_ Etat +I niy _ 88.4855 — 5926.219066 
Gay) 18 219.017 32 


= —0.320 420 


Q> 


-J nžxž + ĎEřyř — 13575.23219 + 3 681.640 980 
Le +y 18219.017 32 


k = y â? + b? = 0.999919 = 1 — 81 ppm 


ð = 120.767 gon. 


ps — 0.947 190 


We calculate fy and ty according to (12.30) and (12.31); in this special case we furthermore 
have fy = Ẹp and ty = jp 


Ep = E — âx —by, = 87.310 — 0.320 420-29. 136 — 0.947 190- 1.589 = 76.470 m 
hp = ns — bxs —âys = —851.258 +0.947 190- 29.136 — 0.320 420 - 1.589 = —878.346 m. 


Example 10.3 uses the same data and results in (E P, Np) = (76.469, —878.344). 
As in all least-squares problems we are furnished with a residual of each observation. 
In the present case we can interpret these quantities. The residual vector f = b — AX is 
Pe, = & — Gx; — by; — f 
; : (13.1) 
ry, = ni + bxi — ay; — ty. 
In our numerical example the residual vectors are shown in Table 13.2. The pair (fs; , 7y;) 
can be looked upon as a residual vector between the Mateus point (x;, yj) and the 
“observed” point (&;, ni). The variance factor is ôg = KG mart )/(2p—4) = 14mm)’. 


Remark 13.1 The present result deviates a little from the ee resulting from a usual 
least-squares procedure as described in Example 10.3. Looked upon as an ordinary least- 
Squares problem we have 3 unknowns but the similarity transformation per se involves 
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Table 13.2 Coordinate sets reduced to respective barycenters and final residuals 


—27.210 —75.472 
38.850  —37.552 


80.222  —1.589 

23.130 48.826 
—35.535 7.075 18.070 31.378 
—67.817 —54.314 | —29.710 81.648 


0.000  —0.002 0.000 0.002 a 


4 unknowns. There is another difference between the two procedures. In the similarity 
transformation all observations get equal weights while in the ordinary least-squares pro- 
cedure they get varying weights. 

In case of the similarity transformation nothing can prevent us from introducing vary- 
ing weights for the observations except that the formulas change so much that the proce- 
dure is no longer an attractive alternative. 


Remark 13.2 Above we started by converting the polar observations (0p, , s p; ) into Carte- 
sian coordinates. Is it possible to directly evaluate the polar observations? Actually, it is 
possible and the observation equations are 


Ei = Ep + ksp, cos(Op, + Zp) 
(13.2) 


ni = np +k sp, sin(Op, +Zp). 


They are nonlinear and hence the similarity transformation must be solved iteratively start- 
ing with preliminary values for €p, np, k ~ 1 and zp % 0. The orientation unknown is 
denoted zp as in Section 13.3. 


101 


In 


100 


100 m 


Figure 13.1 The free stationing 
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13.2 Optimum Choice of Observation Site 


The coordinates (X, Y) of the observation site P in a free stationing are determined by 
observations of distances s1, 52,...,5, and directions 91, @2,..., n. For every distance 
observation a direction observation is taken and vice versa. The set of n directions is 
connected to the coordinate axes through the orientation unknown Z, see Figure 13.2. 
The unknowns of the least squares problem are the coordinates (X, Y) of point P 
and the orientation unknown Z. We introduce preliminary coordinates (X 0 y°) for P 
and put X = X? + x and Y = Y? + y. The coordinates of the ith fixed point are called 
(X;, Y;). As observation we use the observed distance s; divided by the calculated distance 


s? = y (Xi — X°)? + (Y; — Y0)? and we have 
Si V(X; — X)* + Y; — Y} 


“oo es = fi(X, Y). 


Si Si 


In a first order linearization we have 


Si 0 y0 Ofi ofi 
= r= fi(X?,Y =e a) a 
59 rj fil y+ (FE) 2+ (Fe i 


Remember f;(X°, Y?) = so / so. We differentiate and rearrange 


where we have introduced the coefficients u; = (X; — X®) / (50)? and v; = (Y; —Y°) / (s?)?. 

The weight for the ith observation is c; = m/ (a3 + a”) where m denotes the number of 

repeated observations, o? is the variance of the distance and øŻ is the variance of centricity. 
The observation equation for the horizontal direction is 


Y;— Y 
Api = arctan ș y = Qi + Z. 


l 


We split the value for the orientation unknown Z into Z? + z and linearize 
ri = —ViXx —ujy — Z — (—A§, + Qi + Z°) = —vjx — u;y — l -z — bi. 


The weight is given as di = m/ (a? + (o-w/'s;)*) where the variance of a direction observed 
with one set is called of. The factor w converts from units of arc to radians and m is the 
number of sets. 

We rewrite the expressions for the coefficients u; and v;, and introduce the direction 
angle A p;: 


X,; —X° cosAp; 4 Y;— Y? sin Ap; 
“uj; = ——— = ——— an v = ———_ = ——. 
0 0 0 
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Figure 13.2 Free stationing at P with n fixed points. 


In the sequel we omit the °. Note that ur + v? = s? Next we gather all linearized 


observation equations in the following matrix equation: 
—Un —v 0 á 
a ae SEE E | =b-—r. (13.3) 


We assemble the weights into the diagonal matrix 
C = diag(c1,..., nr dj,-..-, dn) 
and the normal equations become 


Y (ciu? +div?) See —di)uivi — Yo divi 
ACA =| (ci —dijuiv; Y (civ? +diu?) Y diui |. (13.4) 
— J divi $ diui > di 


It is rather complicated to evaluate this matrix expression. However we may simplify by 
setting c; = d; = 1. For a modern total station this is in fact reasonable. We now obtain a 
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much simpler expression 
>. s? 0 
ATCA = 0 eae 
-vi —} ui 


We are aiming at the trace of the covariance matrix E = (ATC A)T}. We shall use the 
following formula 


S T -1 S-!+S-IT(U—-TTS-IT)ITTS! =S I r(U =S 7 
a = 
T! U OT sp ITIS (U-TĪS-IT)! 
(13.5) 


This is easily verified by direct multiplication of & with & —! We want to evaluate the (1,1) 
entry of & and start with the expression in parenthesis: 


U-Ts'Tan-[-Yu Zui] hard aati ee) 


The inverse is 
(WaT §4 1) = 


Furthermore 
Us S n T's 


a | (Sui) artery 
nos? = (Luy - (Euy -Euu (Zu) 
Finally we get 

s7! (7 +T(U- st) es) 


Dy = (Zu? -EmEu | | 
-Euu nps- (Eu) 


(13.6) 
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The square sum (>> ui)? + (> vi)? can be evaluated as follows 


(Sou) + (Sou) =a tua tetany? +0 oo Hn) 


Supt tun +2) miuj topte + t2> vy; 


ixj if 
> 2 yo ee COS gp; + SiNa@ p; SiINa Pp; 
= s; +42 Oo 
ar SjSj 
all 
: cos(œpi — æ pj) E COS Vij 
are) ey E, 
Sisi SiS; 
i#j E pp ma 


The angle between the directions of distances s; and s; is denoted v;j. The first sum con- 
tains n terms and the last sum n(n — 1)/2 terms. The final formula is 


-1 nys? 
2 = -2 1 — E p 13.7 
i (Ls ) ( : n-DEs -2D Se ) a 


Si Sj 


In this type of plane geometric formulas one often can substitute some areas or fractions 
of areas. In the present case we introduce the area 7;; of triangle P, i, j and get 


=l nýs”? 
R =2 
rE = (57?) (1+ a) 


2 
(n—1) Dos; oe IT; 


If v;; equals 0 or 7, then sin 2v; j/2T;; must be interpreted as 0. 

It is not easy to interpret formula (13.7). We start by analysing the situation where 
P is located at the center of a unit circle and all n fixed points are equally distributed along 
the circumference. For this ideal case we get 


re=t(i+-—" 2 
ý (n — 1)n — 2(-$) ý 


We define the point error op of P as op = str &/2 = n™ 1/2. Therefore the point error 
of P diminishes inversely proportional to the square root of the number n of fixed points. 

Our example concentrated on the unit circle. Now we multiply the network scale 
with a constant k. Thus all distances are multiplied by k and we get a new covariance 
matrix £* for which we have tr E* = k* tr © which also was to be expected. 

Finally we return to the original model with all fixed points laying on the unit circle. 
But now we move the point P outside the circle at a distance k times the radius 1 from the 
center. We specialize to n = 2 and then it is possible to have sı = s2: 

try} = k 3k’ ke +2 X 2 k. 
2k3-k?4+2 = 


For large values of k we approximately have s; ~ k we get 


R k (2n — 1)k? — (n — 1)c? +2(n — 1) zi 


A 
n (n— Dk? — (n — 1)? +2(n — 1) + z27)k. 


3 j=- 
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This expression tells that a multiplication by k can be compensated for by introducing more 
fixed points n. If we move further away from the area of the fixed points we must use more 
fixed points to maintain an unchanged trace of the covariance matrix. 


Remark 13.3 The present problem can be viewed as a combination of distance and di- 
rection observations. Accordingly the coefficient matrix A in (13.3) can be split into 
F |. If we estimate the two observation types separately, the trace of the common co- 
variance matrix is tr((Af Ai)! + (Aj A2)~"). In general this value is different from 
tr((AT A; + A}A2)~') based on a joint estimation of the problem. If we in (13.4) put 
ci = l and d; = 0 we get 

yu? + dv? 

2 

du? Yu? — ($ uivi) 
The similar expression for tr((APA1)~!) is complicated. 

The least squares problem of a free stationing is an elegant example of the application 


of the Korn inequality. This inequality yields bounds for how much the point variance 
changes if we change the weight of some distance or direction observations. 


tr((Aj A1)‘) = 


Free stationing is for the first time described by W. Snellius in 1617 in his book “Eratos- 
thenes Batavus.” 


13.3 Station Adjustment 


When observing directions between stations, one station is usually selected as a reference 
object. There are no conditions attached to the choice. Readings are subsequently taken 
to the other stations, swinging clockwise, and a second reading is taken beginning with 
the last station and swinging back counterclockwise. The observations are completed with 
a reading to the reference object. The observational result of a full round is the mean 
values b; of the first and the second readings. The total number of rounds depends on the 
precision required. 

We apply a least-squares procedure to the observations. It will be seen that an es- 
timated direction is the mean of its corresponding observations reduced to the reference 
object. The directions have equal variance and are uncorrelated. 

The number of directions is denoted by r, and the number of rounds by s. There are 
m = rs observations in all. The r observations in the first round and the first observation 
of each subsequent round are the necessary ones. This yields a sum total of n =r +s -— 1. 
The number of redundant observations then becomes m — n = (r — 1)(s — 1). 

Selecting the directions as unknowns x; would give r unknowns. In order to compare 
rounds, the observations must be reduced to the same reference direction which may be 
chosen at will. Normally the value of the first direction is put equal to zero. For this 
purpose an orientation unknown z; must be introduced as an unknown in each round. One 
of the s orientation unknowns is superfluous (any of them may be chosen as zero). It is 
easier to maintain symmetry and impose the condition 


ZI +Z +: -+z =0 
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\ 
4 
Figure 13.3 Station adjustment. 


or 
e'z=0. (13.8) 


The observational equations express that the observation plus the orientation unknown plus 
the residual equals the unknown value for the direction: 


10 0 0-1 0 0 
0 1 0 0-1 0 0 
0 0 1 0-1 0 0 m 
0 0 0 1 -1 0 0 6 
10 0 0 0-1 0 k 
0 1 0 0 0-1 0 
Ay = war far aes, aa —b—r. (13.9) 
0 0 0 1 0-1 0 “i 
1 0 0 0 0 0 -l 22 
0 1 0 0 0 0 -1 23 
0 0 1 0 0 0 -1 
0 0 0 1 0 0 -1 
a a, 
B C 


Note that A is an incidence matrix. We know that Ae = 0. This rank deficiency of ATA 
may be removed by orthogonal bordering to produce the augmented matrix 


ATA e 
eT ol 


This is nonsingular and then (A! A)t results from the inverse by deleting the last row and 
column. But we shall follow an alternative development. 
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The structure of the least-squares problem (13.9) suggests splitting it into two in the 
following manner—the notation C has nothing to do with a weight matrix: 


Ay =[B ci|*]=b-r (13.10) 


If we suppose equal weights for all observations, the normal equations are 


B'Bx + B'Cz=B'b (13.11) 
C'Bx+cC'Cz=C'Db. (13.12) 


We solve (13.11) to find 
x = (BTB)! BTb — (B'B)'B'cxz. 


An easy calculation reveals that (BTB)! = lI and BTCz = —Ez where E denotes a 
matrix consisting of ones. Now, by (13.8) we have Ez = 0, and then BTCz = 0. This 
yields the nice estimate 


= }tB'b. (13.13) 
Insertion into (13.12) yields 


1CTBB"b +C'Cz = C'b. 


Then 
z=(C'C) 'c'(1—-+BB")b 
and finally 
2 = tC! (I — +BB')b (13.14) 
where we have used (CTC)! = tT . However, the variance calculations remain. The 


covariance matrix for x becomes 
Ls = 66(B'B) = 1681 (13.15) 


which shows that all x; have the same variance ô? and furthermore they are independent! 
The only way to reduce the variances is by increasing s, by observing more rounds. 
The sum of squared residuals is estimated by 


FTF = b"b — b" AS = b" (b — BE — C2) 
= b' (b — +BB"b — tCC"b + ŁCC™BB'b) 
= b" (I — +BB" — tCC" + ŁCC7BB")b 
= b'(1—+BB'—icc'(1—1BB"))b 
( 2 


=b! (1 —4CC')(1 — 3. BB")b. 
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Finally 


,  b'(1—tcc')(1 —4BB")b 03.16 
- (r — 1)(s — 1) a 
Given a set of rounds we want to estimate directions and orientation unknowns. The best 
estimate for the directions is—not surprisingly—the mean of the measured directions. All 
estimates of directions are independent and of equal variance G6. This variance is given 
by the expression (13.16). Today station adjustment only is used for educational reasons 
to estimate ô? for a given data set taken with a given theodolite. 

In professional connections the original sets of rounds are entered into a least-squares 
procedure. The orientation unknowns z; in most cases are of no further interest so they are 
eliminated from the normals according to the technique described in Section 11.7. 


A 


Example 13.1 Let us assume observations (in gons) for 3 rounds each containing 4 di- 
rections. The observations are given as reduced rounds, i.e. all observations in each round 
are diminished by the observational value of the first direction which consequently has 
value zero. This zero column is omitted. Hence there are only 3 significant columns. The 
numbers are arranged in a form suitable for input to MATLAB. 


50.193 77.873 268.441 
50.187 77.871 268.439 
50.188 77.872 268.440 


The estimated directions are 


0.000 0 gon 
50.1893 gon 
77.8720 gon 

268.440 0 gon. 


The estimated orientation unknowns are 


21 = —1.4mgon 

Z2 = 1.1 mgon 

z3 = 0.3 mgon. 
The standard deviation of unit weight is Go = 1.4 mgon. 


The M-file sets computes the estimated directions x; and orientation unknowns Z;. 


13.4 Fitting a Straight Line 


Let p observations of coordinates be given as pairs (x;, yj). The pairs are not symmetric 
in the sense that only the x; values are subject to errors, and the y; values are considered 
observed exactly. (In the case where both coordinates are stochastic variables the situation 
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Figure 13.4 Fitting line. 


is more complex.) Geometrically, we want to regard them as coordinates of p points in 
the plane. The least-squares problem may be described by a linear condition between x; 
and y;: 


xj =a+cot¢ yi, (13.17) 


with a and g as unknowns. 

Some readers will probably argue that the problem is the usual linear regression. We 
minimize the sum of squares of J; in the direction parallel to the x-axis. This is not the 
case; we want to minimize /; sing which is a quantity orthogonal to the fitting line, see 
Figure 13.4. In statistics our problem is called linear orthogonal regression. In numerical 
analysis it is an example of total least squares. The observation equations are 


(a + yi coto — x;) Sing = asing + yi cos — x; sing = —7;. (13.18) 


Thus the unknowns are a and @ and so we see that a simple least-squares problem may 
result in a not so elementary nonlinear problem. 

Returning to the fundamental principle of least squares we find a surprisingly simple 
solution. By partial differentiation of the sum of squares s = ) r? and equating to zero 
we get 

Os 4 l ; 
— = 2sing Xa sing + y; cos o — x; sing) = 0. (13.19) 
da 


i=l 
Similarly the second minimum condition may be expressed as 
Os 


p 
P 2X (a sing + y; cos — xi sing)(acos g — yi sing — x; cosg) = 0. (13.20) 
p 


i=] 


We rewrite (13.19) and get 


pa sin y + cos o X yi -sing Ý x; = 0 
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or 
ne (Zx - cot X yi). (13.21) 


Once again we introduce the barycenter (xs, ys) defined by xs = ` x;/p and ys = J y; / p. 
We reduce the observations (x;, y;) to this center 


* * 
Xi = Xi —Xs and Y = Yi — Ys 


and substitute in (13.21): 
a= * (pxs — PYs coto) = X; — COLD Ys (13.22) 


or xs = â + coto ys. In view of (13.17) we conclude that the fitting line contains the 
barycenter (Xs, ys). 

Finally we shall find an expression for Ø. Substituting the (x*, y*) pair into (13.20) 
we can put a = 0 and obtain 


— > yer sin ø cos o — X x}'y*(cos” y — sin? g) + ae sing cosy = 0 
or after introducing the double angle 2% 


sin 2p YG? — y*?) — 2 cos 2 ye Xy S 


Finally the angle has 
ax* x 
x* x*— y* y* 
where x* = (xj, N kh -,X5) and y* = (yf, Y3,- -> yp). The expression yields two 


solutions for @; they differ by a right angle. For any given context it should be possible to 
select the correct value. 


Example 13.2 Suppose (x;, yj) = (0, 0), (3, 1) and (12, 2). The barycenter is (xs, ys) = 
(5, 1) and consequently (x**, y*) = (—5, —1), (—2, 0) and (7, 1). So 


24 
tan 29 = ——— = 0.31579 
A g 
or © = 9.736 gon. Then cot @ = 6.49 anda = 5 — 6.49 . 1 = —1.49. The fitting line has 
the equation 
x = —1.49 + 6.49y. 


For comparison, the line of linear regression is x = —1 + 6y. 


Example 13.3 The problem of fitting p pairs of coordinates (x;, y;) to a straight line 
cx+sy=h and +s? = 1 


can also be solved by means of a singular value decomposition of the matrix Y containing 
the reduced coordinates (x;", y*). The column of V pertaining to the largest singular value 
in the decomposition Y = UXV! plays a fundamental role. We finish with a MATLAB 
code for solving the problem in our particular coordinate system. 
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function lor(Y) M-file: lor 
% Linear orthogonal regression. 

% See Dahlquist & Björck, 2nd edition, (1995) Example 7.6.2. 

% The first column of Y contains x and the second column y 

% for p given points 


[p,q] = size(Y); 
e = ones(p,1); 
m = mean(yY); 


Y= Y-etm; 
[U,S,V] = svd(Y); 
cs = V(:,1); 


alpha = atan2(cs(2),cs(1)); 
t = tan(alpha) 
x = tan(—alpha + pi/2); 
% necessary statement due to special orientation of coordinate system 
a = m(2)—1/x 


Part IlI 


Global Positioning System (GPS) 
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GLOBAL POSITIONING SYSTEM 


14.1 Positioning by GPS 


GPS has revolutionized the science of positioning and Earth measurement. One part of 
that revolution is accuracy, another part is speed and simplicity. A third part is cost. All of 
these improvements are contributing to the growth of major applications. We frankly hope 
that our readers will develop new uses for GPS; the technology is ready, only imagination 
is needed. And the initiative to turn imagination into reality. 

But this is a scientific book, not a brochure. We focus on one major advantage of 
GPS: accuracy. The inherent accuracy of a GPS receiver can be enhanced or degraded. It 
is enhanced by careful processing, it is degraded by accepting (instead of trying to elimi- 
nate) significant sources of error. We start by listing three critical techniques for achieving 
centimeter and even millimeter accuracy in positioning: 


1 Work with two or more receivers. The key idea of differential GPS (DGPS) is to 
compute differences of position instead of absolute position. Errors that are shared 
by receivers will cancel when we form differences. 


2 Repeat the measurements. A sequence of observations has a significantly smaller 
variance than a single observation. If the receiver is moving, the Kalman filter can 
account for changes of state as well as new observations. 


3 Estimate each source of error in the observations. Section 14.2 describes the major 
errors and their approximate magnitudes. 


The reader already knows about the dithering of the satellite clocks. President Clinton 
directed that the military should terminate this “Selective Availability” before the year 
2006. All indications are favorable; we would not be surprised to see it end earlier. The 
2-dimensional rms positioning error from this source can be close to 100 meters (and it is 
removed by DGPS, when two receivers are measuring their range from the same satellites). 

We strongly emphasize the importance of time. In GPS, time is the fourth dimension. 
It is the reason we need at least four satellites, not three, to locate the receiver. The four 
coordinates to be computed are x, y, z, and cdt—the speed of light multiplies the clock 
discrepancy. This quantity c dt has the units of distance. Since an ordinary receiver clock 
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might only know the time within a few seconds, the elimination of error from c dt is not 
an optional improvement—it is absolutely required! 

In short: The key to the accuracy of GPS is a precise knowledge of the satellite orbits 
and the time. On the gound are computed Keplerian elements based on the actually ob- 
served orbits (each set of quasi-Keplerian elements is good for 2 hours) and these elements 
are uploaded to the satellites’ memories. The satellites carry atomic clocks (cesium and 
rubidium). They broadcast their own quasi-Keplerian elements for position computation 
at receivers. They also broadcast with lower accuracy (in the almanac) the Keplerian ele- 
ments of other satellites. But it is the precise ephemerides that locate one end of the line 
segment between satellite and receiver. The problem of GPS positioning is to locate the 
other end. 

One basic fact about GPS deserves attention. Its measurements yield distances and 
not angles. We are dealing with trilateration and not triangulation. This has been desired 
for centuries, because angles are definitely awkward. Of course lengths are nonlinear too, 
in the position coordinates x, y, z, cdt. The receiver must solve nonlinear equations. 

The purpose of this chapter is to explain in reasonable detail how GPS works, and 
where mathematics is involved. Then later chapters will describe the actual calculations in 
much more detail: 


Chapter 15: Accurate processing of GPS observations. 
Chapter 16: Random errors and their covariances. 


Chapter 17: Kalman filtering of observations as the state changes. 


We will also describe the MATLAB software that is freely available to the reader. For this 
introduction to GPS, the lecture by Ponsonby (1996) has been particularly helpful. 


Clock Errors and Hyperbolas of Revolution 


The goal is get a fix on the receiver’s position. Suppose there were no clock errors (which 
is false). Then the distances from three satellites would provide a fix. Around each satel- 
lite, the known distance determines a sphere. The first two spheres intersect in a circle. 
Assuming that the satellites do not lie on a straight line, the third sphere will normally 
cut this circle at two points. One point is the correct receiver position, the other point is 
somewhere out in space. So three satellites are sufficient if all clocks are correct and all 
ranges are measured precisely. 

In reality the receiver clock is typically inexpensive and inaccurate. When the clock 
error is dt, every range measured at that instant will be wrong by a distance c dt. We are 
measuring the arrival time of a signal that contains its own departure time. (The velocity 
of light is c ~ 300 m/sec. Of course we would use many more correct digits for c, which 
is slightly different in the ionosphere and the troposphere. These are among the errors to 
be modeled.) The incorrect range, which includes c dt from the unknown clock error, is 
called a pseudorange. 

From two satellites we have two pseudoranges p! and p°. Their difference d!? = 
p! — p? has no error c dt from the receiver’s clock. The receiver must lie on a hyperbola of 
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revolution, with the two satellites as the foci. This is the graph of all points in space whose 
distances from the satellites differ by d!?. 

The third pseudorange locates the receiver on another hyperbola of revolution (a hy- 
perboloid). It intersects the first in a curve. The fourth pseudorange contributes a third 
independent hyperboloid, which cuts the curve (normally twice). Provided the four satel- 
lites are not coplanar, we again get two possible locations for the receiver: the correct fix, 
and a second point in space that is far from correct and readily discarded. This is the ge- 
ometry from the four pseudoranges, and the algebra is straightforward but nonlinear: for 
k = 1, 2,3, 4 we know that 


(x — X*)2 4 O — YP + — Z*)? 4 (edt)? = (0%). 


The Bancroft algorithm to solve for x, y, z, and c dt is described in Section 15.7. 

Note the important step that remains. The receiver must convert this spatial fix into 
a position on a standard geodetic reference system. For GPS this reference is WGS 84. 
The Russian system GLONASS uses now the slightly different reference PZ-90. Then the 
receiver employs a model of the geoid to compute geographical coordinates and height 
above sea level. An ordinary receiver displays latitude and longitude, which allows the 
user to find the position on a map. Not taking into account the correction from WGS 84 to 
the map projection may be the error-prone of all! (Map projections would apply only for 
navigation.) Existing charts are unlikely to be accurate at the centimeter level. Still they 
are probably sufficient for the immediate purposes of typical users (to head toward their 
destination, to locate a landmark, to save their lives, ... ). 


Radio Signals 


The GPS satellites transmit radio signals that are phase modulated according to a known 
sequence of bits. These are DSSS signals (Direct Sequence Spread Spectrum). A very 
useful simplified model is given by Ponsonby (1996), showing a switch SW1 that reverses 
the phase of the sine wave according to a (known) pseudorandom bit stream. The time 
interval T is the chipping time, and over each interval the sine wave is multiplied by +1 
or —1. Then the satellite transmits these positive and negative segments of sine waves. 

The signal coming into the receiver is chopped up by a second switch SW2. This also 
follows a pseudorandom pattern. Everything depends on whether the two chip sequences 
are coherent. If not, the output signal is highly chopped in time and widely spread in 
frequency. Very little power passes through a filter. But if the switching sequences at SW1 
and SW2 are exactly matched, the output signal is back to a perfect sine wave. The wave 
passes through the filter with maximum gain in the power level. This timing alignment is 
described by Ponsonby as the equivalent of fitting a key into a lock. 

Once the sequences are aligned, a Delay Lock Loop maintains the synchronism. 
And since the receiver is moving with respect to the transmitter (the speed of the satellite 
is about 3900 m/sec), there will be a Doppler shift in the frequency of the pure sine wave. 
This frequency shift provides a good measurement of velocity. The receiver displays its 
velocity converted to Earth coordinates. Of course differences in position give a direct 
(non-Doppler and less accurate) estimate of velocity when divided by At. 
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Actual Doppler observations are very receiver-dependent, sometimes directly from 
the phase and sometimes by separate software. Phase tracking over several seconds will 
produce a much more accurate velocity than observations over milliseconds. So Doppler 
is application-dependent: ship navigation is very different from aircraft. 


C/A Code and P Code 


The actual modulated signal is divided into two independent components. They are modu- 
lated by different bit sequences. The slower Coarse/Acquisition (C/A) code has a chipping 
rate of 1.023 MHZ, and the Precision (P) code is 10 times faster. The C/A code is available 
to all users. The encrypted P code is referred to as the Y code which is reserved for the 
military (although it is partly useful to others also). 

All GPS satellites use the same carrier frequencies. Each satellite has its own pseudo- 
random sequence of 1023 bits (its periodic C/A code which is a Gold Code). The repeat 
time is 1.5s for the C/A code and one week for the P code. It is most important to know 
that there are (at present) two coherent transmission frequencies for the P code (and its 
secret encryption, the Y code): 


Lı is 1575.42 MHz and Ly is 1227.60 MHz. 


The two frequencies are differently delayed by the ionosphere. A receiver that accepts both 
frequencies can compute this delay (because it is known to be proportional to 1/f7). This 
correction to the speed of light through the ionosphere is essential. A single frequency 
receiver has to make do with estimates of the ionospheric correction whose parameters are 
broadcast by the satellites. 

At this time there is wide discussion of the proposal for a new civilian frequency Ls 
(then the military might remove Lz from civilian use). The difficulties of agreeing on a 
new frequency before launching expensive satellites seem to have frustrated everyone. 


Pseudorandom Noise Code Generation 


The pseudorandom noise (PRN) codes transmitted by the GPS satellites are deterministic 
binary sequences with noise-like properties. Each C/A and P code is generated using a 
tapped linear feedback shift register (LFSR). 

A shift register is a set of one bit storage or memory cells. When a clock pulse is 
applied to the register, the content of each cell shifts one bit to the right. The content of 
the last cell is “read out” as output. The special properties of such shift registers depend 
on how information is “read in” to cell 1. 

For a tapped linear feedback shift register, the input to cell 1 is determined by the 
state of the other cells. For example, the binary sum from cells 3 and 10 in a 10-cell register 
could be the input. If cells 3 and 10 have different states (one is 1 and the other 0), a 1 
will be read into cell 1 on the next clock pulse. If cells 3 and 10 have the same state, 
0 will be read into cell 1. If we start with 1 in every cell, 12 clock pulses later the contents 
will be 0010001110. The next clock pulse will take the 1 in cell 3 and the O in cell 10, 
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and place their sum (1) in cell 1. Meanwhile, all other bits have shifted cell to the nght, 
and the 0 in cell 10 becomes the next bit in the output. A shorthand way of denoting this 
particular design is by the modulo 2 polynomial f(x) = 1 +x? + x!?. Such a polynomial 
representation is particularly useful because if 1/f (x) = ho + hıx + hax? + hax? +., 
then the coefficients ho, h1, h2, ..., form the binary output sequence. 

The C/A code is generated by two 10-bit LFSRs of maximal length (2!9 — 1). One 
is the 1 + x? + x!° register already described, and is referred to as G1. The other has 
f(x) = 14 x7 +x? +x +x +x? + x1. Cells 2, 3, 6, 8, 9, and 10 are tapped and binary 
added to get the new input to cell 1. In this case, the output comes not from cell 10 but 
from a second set of taps. Various pairs of these second taps are binary added. The different 
pairs yield the same sequence with different delays or shifts (as given by the ‘shift and add’ 
or ‘cycle and add’ property: a chip-by-chip sum of a maximal length register sequence and 
any shift of itself is the same sequence except for a shift). The delayed version of the G2 
sequence is binary added to the output of G1. That becomes the C/A code. The G1 and 
G2 shift registers are set to the all ones state in synchronism with the epoch of the X1 
code used in the generation of the P code (see below). The various alternative pairs of G2 
taps (delays) are used to generate the complete set of 36 unique PRN C/A codes. These 
are Gold codes, Gold (1967), Dixon (1984), and any two have a very low cross correlation 
(are nearly orthogonal). 

There are actually 37 PRN C/A codes, but two of them (34 and 37) are identical. The 
first 32 codes are assigned to satellites. Codes 33 through 37 are reserved for other uses 
including ground transmitters. 

The P code generation follows the same principles as the C/A code, except that 
four shift registers with 12 cells are used. Two registers are combined to produce the X 1 
code, which is 15 345 000 chips long and repeats every 1.5 seconds; and two registers are 
combined to produce the X2 code, which is 15 345 037 chips long. The X1 and X2 codes 
can be combined with 37 different delays on the X2 code to produce 37 different one-week 
segments of the P code. Each of the first 32 segments is associated with a different satellite. 


Special and General Relativity 


GPS positioning is one of the very few everyday events in which relativistic effects must 
be accounted for. The whole system is based on clocks, and those clocks are moving. The 
satellite clock is moving with respect to the receiver clock, so time is dilated and Special 
Relativity enters. All the clocks are in a gravitational field (of the Earth), so General 
Relativity is significant. And the Earth is rotating; light follows a spiral path. We cannot 
perfectly synchronize the clocks! 

The Sagnac Effect from the rotation is also fascinating. It destroys Einstein Syn- 
chronization, which depends on a constant speed of light. That constancy is restricted to 
inertial frames (no relative acceleration). The rotation of the Earth means that clock A can 
be synchronized with B, and B with C, but clock C is not synchronized with A. So we need 
a universal time that goes at a different rate from local time. This coordinated Universal 
Time is maintained at the GPS control center in Colorado Springs. 
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Figure 14.1 The GPS constellation: 6 planes with 4 satellites. The 2 orbits/day fall 
15 minutes short of completion, producing the small gaps at the end of each orbit. The 
configuration corresponds to GPS week number 907 starting at 324 000 seconds of week. 


Orbits of the Satellites 


The satellites stay at an altitude of approximately 3 Earth radii. They travel in nearly 
circular orbits, two complete orbits in each sidereal day. The inclinations to the Equatorial 
plane are 55°. In practice a pseudorange is most reliable when the satellite is at least 10° or 
better 15° above the horizon. Figure 14.1 shows in two dimensions the GPS constellation 
of 24 satellites, with four satellites spaced around each of six orbital planes. Actually we 
count 25 satellites in Figure 14.1. Tomorrow it might be 24 or 26! 


Differential GPS 


Suppose two receivers are reasonably close together (say less than 100 kilometers). Then 
the signals from a satellite that is 20 000+ kilometers away will reach the two receivers 
along very close paths. The delays in the ionosphere will be nearly identical. The errors due 
to an incorrect satellite clock (which has been dithered to achieve Selective Availability) 
are the same. So are the errors in the satellite orbits. Those errors will cancel in the 
difference of travel times to the two receivers. 

We emphasize that only frequency standard variations cancel exactly in differencing. 
Time synchronization, atmospheric effects, and orbit errors are all proportional to station 
separation. Ionospheric gradients are typically 1 part in 10°, for example. For precise 
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positioning at a single frequency, a 100km separation is much too large to make the ef- 
fects negligible (in fact they may reach 10 mm even over 1-2 km). With two frequencies, 
they nearly cancel for any distance. Tropospheric effects are also important in precise po- 
sitioning for separations over 1—2km. Without SA the orbital errors are 20m, which is 
| ppm, or 5cm over 100km. This is important for tectonics but not for routine surveying 
or navigation. 

DGPS is based on this simple idea. If the position of one “home” receiver is exactly 
known, then its distance from each satellite is easily computed. The travel time at the 
speed of light is then available (divide the distance by c). This theoretical travel time can 
be compared with the measured travel time. The error is found for each satellite in view, 
and these errors are transmitted by the home receiver. The second receiver (or any number 
of roving receivers) will pick up this information and correct their own travel times from 
the satellite, by that same amount. 

Thus the small investment in an extra receiver-transmitter enables us to cancel part 
of the natural effects of the ionosphere and the unnatural effects of SA. 

The principle of working with differences is even more used in post processing of 
GPS data. Double differences (2 satellites as well as 2 receivers) are explained and con- 
stantly applied in Chapter 15. The accuracy achieved by DGPS is remarkable. 


New Uses 


This extremely brief section is for the reader to complete! If our book can encourage new 
ideas and applications for GPS, its authors will be very pleased. Note especially that the 
microprocessor in a GPS receiver has unused capacity. The power is there. It can be 
coupled to other tools. Give it some thought. 


14.2 Errors in the GPS Observables 


There are two fundamental observables, the pseudorange pf and the carrier phase OF. AS 
usual k specifies the satellite and i identifies the receiver. The phase is much more accurate 
than the pseudorange (even using the P code). Both observables will include errors from 
many sources! Some errors can be removed, others can be reduced, and others are just 
neglected. This section estimates the magnitudes of the errors and our options for dealing 
with them. 

We deal first with Selective Availability. When SA is imposed for military reasons, 
all receivers and users encounter identical errors in the satellite clock. This affects the 
C/A and P code and carrier phase measurements equally. The technique of differential 
GPS (using one receiver in a known position as a reference) can essentially eliminate this 
artificial degrading of the system. If it is not eliminated, the error from SA will dominate 
all others. Parkinson & Spilker Jr. (1996) estimate the rms error (the clock offset multiplied 
by the speed of light) as 20 meters. 
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A solitary individual will see this error in the range. When multiplied by the Dilution 
of Precision factors in Section 14.4 (say VDOP = 2.5 in the vertical and HDOP = 2 in 
the horizontal), this implies rms positioning errors of 50 meters vertically and 40 meters 
horizontally. Those are so large compared to the intrinsic GPS errors that we hope SA will 
be turned off before this book is a year old. SA is guaranteed to disappear eventually. 


Delay of the Signal 


Now suppose that SA is eliminated. It is either removed by the military or cancelled by 
forming differences in DGPS. We follow Parkinson & Spilker Jr. (1996) and Kleusberg & 
Teunissen (1996) in estimating the sources of error that remain. 

The largest errors (until we compensate for them) come from the delay when a signal 
travels through the atmosphere. So we first recall the connections between velocity and 
refraction and travel time. Then we estimate the range errors due to incorrect travel times 
and other sources. 

In a vacuum the speed of an electromagnetic wave is c. In the atmosphere, this speed 
(the phase velocity) is reduced to v. The dimensionless ratio n = > is the refractive index. 
The number v is related to the angular frequency w and wave number k by v = Ẹ. 

In a dispersive medium, these numbers are functions of w. A packet of waves with 
frequencies near w will travel not with the phase velocity v but with the group velocity: 


dw dv 
Ugroup = dk = vV + k—. (14.1) 


That is the travel speed of a modulation superimposed onto the carrier wave. The refractive 
index becomes Ngroup = C/VUgroup- 

The wave travels from satellite to receiver along the quickest path S (by Fermat’s 
principle of least time). At a planar interface in the medium, this principle yields Snell’s 
law nj Sinz; = n2sinZ2 for the change in the angle z (between the path and the normal 
to the discontinuity). The delay along S of the carrier, in comparison to the straight-line 


path L in a vacuum, is df: 
dS dL 
de J MR E 
5 V L C 


Most of this delay comes from change of speed, a smaller part comes from change of path. 
Multiplying by c, the two parts are 


edto = [@-naL+( f nas- f nar) (14.2) 
L S L 


For the modulation, which carries the important signal for GPS, v becomes vgroup and the 
refractive index becomes “group. 

The scientific problem is to determine n and group from properties of the atmo- 
sphere, the electron density in the ionosphere and the air/water densities in the troposphere. 
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Table 14.1 Relative accuracy of clocks 


Crystal wrist watch 1076 
Geodetic GPS receiver 1075-1077 
TI, crystal in oven 1078-107? 


GPS satellite clock, SA on | 107? 

Rubidium 107!!-107!? 
Cesium 10712-10713 
Hydrogen maser 1070 


Error Budget for GPS 


Here are the principal errors in GPS positioning—their sources and also their approximate 
magnitudes. We estimate errors in the range and multiply by the DOP factors. 


Ephemeris Errors The satellite transmits its Keplerian elements, almost exactly but with 
a small error. This grows from the time of upload by a control station until the next upload. 
The error growth is slow and smooth, and only the projection of the ephemeris error along 
the line of sight produces an error in the range. Parkinson & Spilker Jr. (1996) estimate the 
rms ranging error as 2.1 meters (and the estimate now might be smaller). 


Satellite Clock Errors An atomic clock, with a rubidium or cesium oscillator, is correct 
to about 1 part in 10!?. In a day the offset could reach 1077 seconds; multiplied by c this 
represents 26 meters. With clock corrections every 12 hours, an average error of 1 meter is 
reasonably conservative. 


lonosphere Errors GPS signals are delayed as they pass through the ionosphere, which 
starts 50 km above the Earth and extends to 1000 km or more. The delay is proportional 
to the number of electrons (integrated density along the signal path) and inversely propor- 
tional to f?. Thus the effect is dispersive; it depends on the frequency f. The density of 
free electrons varies strongly with the time of day and the latitude. The variations from so- 
lar cycles and seasons and especially short-term effects are less strong but less predictable. 
If the delay were not accounted for at all, the ranging errors on the Lı frequency in the 
zenith direction could reach 30 meters. The effects on the pseudorange P and phase ® are 
opposite in sign; the carrier phase is advanced. 

So we must estimate the ionospheric delay. A dual-frequency receiver can measure 
the pseudoranges P, and Pz on both frequencies L; and L2, and solve for the delay: 


f 


d Pion = ap (Pi — P2) + random/unmodelled errors. (14.3) 


2 1 
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This should be removed from Pı. Similarly the phase correction for ionospheric delay is 


2 
(Gui, — à2 N2) — (BD; — ®2)) + random/unmodelled errors. (14.4) 


2 1 


d Pion = 


Equations (14.3) and (14.4) have equal value (in delay units) but opposite signs, so the 
ionosphere can be unambiguously calibrated by a combination of pseudorange and phase 
at both frequencies. í 

The ambiguities N; and N2 remain constant (but possibly unknown) if there are no 
cycle slips. So at least a differential delay is known. This estimate is good but there is 
often a better way. If you have a dual frequency phase receiver, the P code observations 
allow you to estimate the ionospheric correction. Then the improved pseudoranges can 
help resolve the ambiguities N; and N2, completing the circle. 

For measurements at only one frequency, these formulas for d Pion and d®ion are 
useless. In DGPS the ionospheric delay at two receivers is cancelled when we compute 
the (sufficiently short!) baseline between them. The difference in signal paths produces 
a slight baseline shortening, proportional to electron content and baseline length. One 
receiver at one frequency can use the prediction model for d Pion and d ®ion contained in 
the GPS broadcast message. Tests show better results than promised (but not great). 


Troposphere Errors The troposphere is the lower part of the atmosphere, thickest over the 
Equator (about 16 km). The temperature and pressure and humidity alter the speed of radio 
waves. Their effects are nearly independent of the radio frequency, but they depend on the 
time of passage. For a flat Earth we would divide the zenith delay (the delay at elevation 
angle El = 3) by sin El. There are a number of good mapping functions to improve this 
to a spherical-surface model. The M-file tropo uses a mapping function proposed by Goad 
& Goodman (1974) to compute the reduction. 

In the zenith direction, the total tropospheric delay is estimated as about 2.3 meters. 
The hydrostatic component (responsible for 90%) is the path integral of the density of 
moist air. The wet component is a function of water vapor density which is highly variable. 
It is questionable how descriptive actual measurements can be. The classical example is 
sitting in a fog bank only 50 m high. The other extreme is sitting in relatively dry air below 
a dark thundercloud. Both of these conditions are met in GPS surveys. 

In fact Duan et al. (1996) proposed and successfully demonstrated that in the reverse 
direction, the water vapor density could be measured by GPS! This is a beautiful example 
of an unexpected contribution coming from accurate measurements of time and distance. 
The electron content of the ionosphere can also be studied by GPS. 

The delay from liquid water in clouds and rain is well below 1cm. But models of 
the wet delay (water vapor) using surface meteorology are often wrong by more than | cm. 
Again we recommend the discussion by Langley in the Kleusberg-Teunissen book. 


Multipath Errors A GPS signal might follow several paths to a receiver’s antenna. The 
same signal arrives at different times and interferes with itself. This produces ghost images 
on TV (before cable) and corresponds to echos for our voice. In GPS, the signal can be 
reflected from buildings or the ground and create a range error of several meters or more. 
Some authors would allow 10 meters for multipath error in C/A code measurements. 
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Table 14.2 Standard Errors without SA 


Single frequency Double frequency 


Ephemeris data 
Satellite clock 
Ionosphere 
Troposphere 
Multipath 


Multipath is a serious problem because it is so difficult to model. Sometimes we 
can improve the site for the receiver. The design of the antenna is also critical. Large 
groundplanes, with various antenna elements (dipoles, microstrip), are the most common 
antidote for multipath. The receiver can be built with a narrow correlator to block the 
reflection, or with multiple correlators to allow estimation on several paths. And for a given 
satellite/static receiver pair, at a given time of day, we could try to estimate repeatable paths. 

The multipath errors in the phase observations ® are much smaller, at the centimeter 
level. We receive the sum of two signals. The reflected signal has a phase shift A® and its 
magnitude is attenuated by a factor a: 


received signal = Acos®+aAcos(® + AP). 


The multipath error, comparing the phase of this sum to the correct value ®, is 


sin ® 
d® = arctan] ———————_ |. 
(= + cos =) 


The worst case has no attenuation (a = 1) and dẹ = 90°. This means only a quarter- 
wavelength error (5 cm) from multipath. 


Other sources of error exist! Receivers are not perfect but they are continually improving. 
Of course the Earth is moving up and down (there are tides on dry ground just as in the sea, 
but smaller). These tides can be accounted for almost completely. The actual position of the 
Earth’s surface and of sea level affects vertical positioning by GPS, which is less accurate 
than horizontal positioning. As a limitation on vertical accuracy for precise positioning, 
tides are less difficult than the troposphere and antenna problems (multipath and phase- 
center variations). But still we can land an airplane. 


Summary of the Error Budget 


Table 14.2 gives approximate rms errors ignoring selective availability. (Otherwise SA 
would dominate everything.) The error sources are reasonably independent, so the square 
root of sum of squares is the UERE—the user equivalent range error. This is multiplied by 
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the Dilution of Precision (say VDOP = 2.5 and HDOP = 2) to give the standard deviation 
in position—1in other words, the one sigma error. 

This overall range error multiplied by the DOP factor in Section 14.4 is roughly 
10 meters for a single frequency civilian C/A receiver. A dual-frequency P/Y code re- 
ceiver would experience roughly half that error, when ionospheric delay is cancelled and 
the ephemeris and clock errors become the largest. Elsewhere we discuss the double dif- 
ferences and long-time averages and Kalman filtering that reduce’ the position error to 
centimeters and even millimeters. 


14.3 Description of the System 


The traditional observable in positioning and navigation has been the angle. This quantity 
is fundamental to geodesy. But since the 1960’s geodesists additionally have used the elec- 
tronically measured distance in their science. For GPS, distance is the basic observable. 
Thus triangulation is being largely replaced by trilateration. 

From a geodetical point of view a key feature of GPS is that the user can calculate 
coordinates to all visible satellites at any time. The calculation is done according to a 
complex algorithm, described in Chapter 15. The input is derived from the satellite signals; 
the receiver immediately finds the satellite coordinates. These coordinates relate to an 
Earth centered and Earth fixed (ECEF) system. The geocentric coordinates of the receiver 
are also related to this system. Any good receiver provides the position within 2 min. from 
being switched on! 

Conceptually the satellites may be looked upon as points with known coordinates in 
space, continuously emitting information on their position. The receiver measures its dis- 
tance to each satellite. In the field, the receiver coordinates (X;, Y;, Zi) can be determined 
with an accuracy varying from 20 m to 100 m. The accuracy can be greatly improved by us- 
ing offset values from a fixed receiver, whose position is known. This method is known as 
differential GPS or DGPS. Geodesy deals exclusively with DGPS and typically it achieves 
accuracies at the level of centimeters. 

The frequency fọ = 10.23 Mhz is fundamental for GPS. Each satellite transmits 
carrier waves on two frequencies (this is in 1997—a new civilian frequency L5 is under 
discussion). The Lı signal uses the frequency fı = 154 fọ = 1575.42 Mhz with wave- 
length A; = 0.1905 m and the Lz signal uses frequency fo = 120 fọ = 1227.60 Mhz with 
wavelength Az = 0.2445m. The two frequencies are coherent because 154 and 120 are 
integers. Lı carries a precise code and a coarse/acquisition code. L3 carries the precise P 
code only. A navigation data message is superimposed on all these codes. 

The most accurate distances are computed from phase observation. This consists of 
the difference in phase of the incoming satellite signal and a receiver generated signal with 
the same frequency. The phase difference is measured by a phase meter and often with a 
resolution of 107°? cycle or better. The initial observation only consists of the fractional 
part of the phase difference. When tracking is continued without loss of lock we still 
record a fractional part plus the integer number of cycles since the initial epoch. But the 
observation does not provide us with the initial integer—the ambiguity N . 
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The resolution of this ambiguity—counting the cycles without slips—is a crucial 
problem for GPS. This is particularly true for short observations, a few hours or less, of 
phases that are doubly differenced. The ambiguity problem is irrelevant for one-ways and 
single differenced observations because of the non-cancelling term g; (tg) in (14.16). 

Precise GPS solutions depend highly on how well clock errors in satellites and re- 
ceiver are eliminated. Realizing that light travels 3 mm in 0.01 nanoseconds it is evident 
that usable GPS observations on millimeter level require that we can keep time to within 
nanosecond level. 

A standard observational method consists of phase observations by two or more re- 
ceivers of four or more satellites at receiver epochs. An epoch is an instant of time. For 
double differenced observations we only need to know the sampling epoch to within a 
half microsecond. This can almost always be obtained from P code pseudoranges, even 
with SA. Double differences require only good short-term stability of the receiver oscilla- 
tor; the long-term stability need be no better than your wrist watch, see Table 14.1. 

The original observation is a (one-way) difference between a receiver and a satel- 
lite. Next we make differences between two receivers and one satellite to eliminate errors 
common to the satellite. The most important error is from the satellite clock because of 
the dithering by SA. Next we make differences between two receivers and two satellites— 
double differences. This double difference eliminates errors that are common to both 
receivers. The largest errors are receiver clock offsets. Receiver clock drifts are not elimi- 
nated. 

The differencing technique is effective for repairing a not too perfect synchronization 
between the two receivers. The serious multipath errors do not cancel as they depend on 
the specific reflecting surface at the receiver. A better antenna design anda better signal 
processing is the goal of current research. 

The differencing may be continued: We can form the difference of two double dif- 
ferences between two epochs. The resulting triple differences (over receivers, satellites, 
and time) eliminate the ambiguities N. These are constant over time. But triple differences 
lose geometric strength because of the differencing over time. They are highly correlated 
and numerically less stable. 

The GPS signals may be interrupted by buildings or other constructions causing cy- 
cle slips. When lock of signal is established again, the integer ambiguity most likely is 
wrong and has to be determined anew. Today detection of cycle slips is based on efficient 
algorithms which assume that the epoch-to-epoch ionospheric changes are small. The com- 
bination 4; Ni — A2N2 therefore is sensitive to cycle slips. For time intervals smaller than 
a few seconds, the ionospheric variations are at the subcentimeter level. Cycle slips result 
in distance errors that are multiples of the wavelength 4; too large for serious science. 

Signals at the high frequencies Lı and L2 propagate relatively easily through the 
ionosphere. The ionospheric delay is inversely proportional to the square of the frequency. 
For dual frequency receivers this is utilized to eliminate most of the ionospheric delay. 
If we succeed in estimating the correct integer ambiguity value N we talk about a fixed 
solution (of the ambiguity). A float solution will have an incorrect and non-integer N. 

The final step of the data processing is a least squares estimation of point coordinates 
based on the estimated vectors. The estimation yields an assessment of quality of the 
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observations and a possible detection of gross errors. The result of a GPS survey is a 
network of points whose position is controlled through a least squares estimation. 

On these few pages we tried to give a brief introduction to how GPS is used for 
positional purposes. The following pages focus on computational aspects of GPS. 


14.4 Receiver Position From Code Observations 


The basic equation for a code observation (distance but not phase) looks like 

PE (t) = pk + IF + TF + e(dt*t — tt) — dt;(t)) — ef. (14.5) 
The ionospheric delay is 7 he the tropospheric delay is T., and c denotes the vacuum speed 
of light. The clock offsets are d t* for the satellite and dt; for the receiver; and e denotes 


an error. The distance between satellite k and receiver i—corrected for Earth rotation—is 
defined by 


pi = | Ra(werf)r* (t — 17) geo — ri (t)ecer ||. (14.6) 
The matrix R3 accounts for rotation by the angle we = while the signal is traveling: 
Cos(weT) Sin(We A ) 0 
R3 (Wet) = | — sin(WeTs) COS(WeTĂ) 0 |. (14.7) 
0 0 l 


The rotation matrix is necessary when using vectors referenced to an Earth centered and 
Earth fixed system (ECEF). The travel time from (the signal generator in) the satellite k to 
(the signal correlator in) the receiver i is denoted oe The rotation rate of the Earth is we. 
Position vectors in the ECEF system are denoted r (t)gcep. The argument t emphasizes 
dependence on time. 

Later we only deal with differenced observations from which most of the systematic 
errors are eliminated. With the eventual application in mind, we simply omit those errors. 


Example 14.1 We want to demonstrate the practical use of the code observation equation 
(14.5). We linearize it. The Jacobian acts several times as observation matrix in a least 
squares estimation of receiver coordinates and receiver clock offset. Let 


x* X; 
Y* | = R3 (weti )r* (t — tf )geo and Y; | =ri(t)ecer- 
Zk Z; 


Omitting the refraction terms Ik and ifa equation (14.5) now linearizes as 


Kk eer e i 
pao * Aene | ial zi + I(cdtj) = P 
i 


= EE O ater Sh See 
(Pky0 (Pky0 i obs ( i) i l i 


(14.8) 


In a first approximation we put pf X (p£)? = ey which is the geometric distance 
as calculated from the preliminary coordinates of satellite and receiver. The number b; 
denotes the correction to this preliminary value. 
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We assume that the code observation p! is corrected for the clock offset dt* of the 
satellite according to the broadcast ephemerides. The preliminary value of Pi is calculated 
from the preliminary coordinates (X ye Z?) of the receiver. It is corrected for receiver 
clock offset dt;, with first guess dt; = 0. If the preliminary values are good the right 
side b; is small. Note that the direction cosines use the components described by (14.6). 
The satellite coordinates must be rotated by the angle we ce around the 3-axis before they 
can be used as ECEF coordinates. 

When more than four observations are available we estimate the position through a 
least squares procedure. It is reasonable to assume that the observations are uncorrelated. 
Then the weight matrix C = og 21 is diagonal. 

The linearized observation equations are of the type described by (14.8) and we 
arrange the unknowns as x = (xi, Yi, Zi, cdti): 


ph pO i 
e EAN ea a 
Ax = p? p? p? A =b—-e. (149) 
5 cdti 
Ahe ee ee j 


pi” pi” pi 

Note that the factors c and dt; stay together in one product. This is done for numerical 
reasons. The unknown c dt; has the same dimension as the other unknowns, namely length. 
However some people like to estimate the term as time. This can be done by using the 
unknown dt; together with a reasonably small coefficient like c x 10~? yielding dt; in 
nanoseconds. 

Row-wise the first three columns of A contain the direction cosines for the vector 
between satellite and receiver. Note that all entries of A have magnitude less than or equal 
to one. The least squares solution is 


= (Ala !4)'alx!o. (14.10) 


The code observations are considered independent with equal variance so € ~ N (0, o*1). 
In other words the vector € has zero mean and covariance matrix o*/. If this assumption 
is correct, (14.10) is simplified to 


~ | = (ATA) !AlS. (14.11) 


i 
c at; 


The final receiver coordinates are X; = KA f; = Ye + ĵi, and Z; = Z? + 3. 


462 14 Global Positioning System 


Example 14.2 (Dilution of Precision) The covariance matrix UgcgF for (xi, Yi, Zi, cdti) 
contains information about the geometric quality of the position determination. It is smaller 
(and x is more accurate) when the satellites are well spaced. The trace of Ugcgr is a 
very compressed form of information—only a single number—and we cannot recover the 
confidence Sade from i According to its definition the trace is the sum of the four 
variances of +0% +02 +a: oe and it is independent of the coordinate system. In addition to 
the information on the geometry these variances involve the accuracy. of the observations. 
This can be eliminated by division by oR. 
We start from the covariance matrix of the least squares problem (14.9): 


(14.12) 


2 
Ocdt,X Ocdt,Y Ocdt,Z Ocdt 


The propagation law transforms “gc into the covariance matrix expressed in a local 
system with coordinates (E, N, U). The interesting 3 by 3 submatrix S of Lecer is Shown 
in (14.12). After the transformation F', the submatrix becomes 


o OEN SEU 
Ernu = | ove of onu | =F'SF. (14.13) 
OUE SUN O 5 
The matrix FT connects Cartesian coordinate differences in the local system (at latitude Q 


and longitude A) and the ECEF system. The sequence (E, N, U) assures that both the local 
and the ECEF systems shall be nght handed: 


F! = R3(1)Ro(y — 4) R3 — r) 


-1 0 sng 0 cosg ||—cosà —sinà 0 
=/|/1 0 0 0 l 0 sind —cosà 0 
0O 1 —cosg 0 sing 0 0 l 
— sin À cos À 0 
= ]-—singcosA —singsinA cosọ |. (14.14) 


cosg COS À cosgsinA sing 


In practice we meet several forms of the dilution of precision (abbreviated DOP): 


OF of tr( 
Geometric: GDOP = ET ad ed = ut Oat. tr(XECEF) saad 
o2 
y 0 


2 2 
Op + On 


2 
00 


Horizontal: HDOP = 
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Position: PDOP = og toy toG = og +o +97 = ftr@agnu) 
7 a? E o2 = o2 


Time: TDOP = cdt /00 
Vertical: VDOP = øy /00. 


Note that all DOP values are dimensionless. They multiply the range errors to give the 
position errors (approximately). Furthermore we have 


GDOP? = PDOP? + TDOP? = HDOP? + VDOP? + TDOP?. 


The DOP values are especially useful when planning the observational periods. For this 
purpose it is better to use almanacs without high accuracy, rather than transmitted ephemer- 
ides. Almanac data allow for precomputation of satellite positions over several months and 
with sufficient accuracy (the ephemerides repesentation of the orbits is valid over a short 
period of time). Some satellite constellations are better than others and the knowledge of 
the time of best satellite coverage is a useful tool for anybody using GPS. 

Experience shows that good observations are achieved when PDOP < 5 and mea- 
surements come from at least five satellites. 


14.5 Combined Code and Phase Observations 


GPS observations are characterized by a multitude of data collected at short time intervals 
varying from 1 s to say 30s. The data are processed either in real time or in post processing 
mode by means of least squares or the filtering techniques in Chapter 15. Some compu- 
tational methods focus on achieving high accuracies; they require longer times. Other 
methods concentrate on short processing time or real time availability. 

Today the best geodetic receivers deliver dual frequency P code and phase observa- 
tions. We shall restrict ourselves to the latest methods for processing data observed with 
static antennas. The accuracy is greatest with post processing. 

A popular procedure for processing GPS observations uses double differences. Such 
double differences are quite insensitive to shared changes of position of the two receivers, 
but they are very sensitive to changes of one receiver relative to the other. Therefore, 
double differences resemble classical distance and direction observations. 

In order to separate geometry, i.e. the vector, from the ambiguities in the cycle count, 
we want a lot of data. With no a priori knowledge about the vector, it is difficult within a 
short period of time to distinguish the vector from the ambiguities. But as time goes by, 
hopefully only one vector fits the double differenced observations. The more satellites that 
are observed, the faster the vector can be determined. As soon as the vector is uniquely 
determined within a fraction of a cycle, an ambiguity can be fixed to its integer value. This 
is the key to optimal use of double differenced observations. 

Usually the ambiguities and the vector are estimated according to the least-squares 
principle. That is, the best estimates of the ambiguities and the vector are those values 
which minimize the squared sum of residuals. Ambiguities are often treated as reals! As 
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Receiver j 


Receiver 1 


Figure 14.2 Double difference pk — p! — p; + pi. Two receivers observe two satellites 
at the same time. 


the systematic errors are eliminated, the ambiguities tend to integer values. The classical 
case involves short baselines for which ambiguities are safely determined. 

For reference, we shall use the terminology already introduced by Yang & Goad & 
Schaffrin (1994). The ideal pseudorange p* is a combination of all nondispersive clock- 
based terms: 


pe* = pf (t,t — tf) + T} + c(dti(t) — dtt (t — 1%). (14.15) 


If the ionospheric effect Ik were zero, * would be identical to the pseudorange p. The 
term T: denotes the tropospheric delay. The terms in parentheses cancel when we use 
double differenced observations, because these are clock errors. 

The basic equation for a phase observation at the same frequency as pseudorange is 


OF (t) = pf — IF + TK + c(dt;(t) — dt (t — 7) + Aloi lto) — 9" (t0)) + ANF — ef. 
(14.16) 


The new terms are the ambiguities N > between satellite k and receiver i, and the non-zero 
initial phases ø* (tọ) and g; (to). Again we introduce the ideal pseudorange given above: 


D(t) = o*t +aN¥ (14.17) 


where N*k = N‘ + gi (to) — v* (to). For double differences the two ø terms and the two 
dt terms cancel. This means that in case of double differences NxM = Ni. This applies 
below in (14.20) and (14.21). 

We shall demonstrate in detail how to make double differences of the observations. 
The original observation is a (one-way) difference between a receiver and a satellite. To 
eliminate the satellite clock error we make differences between two receivers and one satel- 
lite. Finally the double difference between two receivers and two satellites also eliminates 
receiver clock errors: 


on Ly: př! i= pf — pi — +p, FIP + TE — efli (14.18) 


on Lz: Px =E PF +O; + (A/AY TG + Ty — uj. 0419) 


14.6 Weight Matrix for Differenced Observations 465 


Explicitly r= (Tt —T;) - (7; = T; ), and similarly for 7; k , Ni KI and er. Subscripts 1 
and 2 refer (0 Lı and L2 with frequencies fı and fo. In A to eniphasize the influence 
of geometry we have left the p terms uncombined; they are also double differences. So are 
the phase observations: 


of, = pf pol pk +0) IE + TH + ant, ehh (14.20) 
Piy =P oi — RF +0) — fil fa) Te + Tf + NZ; ELi (14.21) 


The ionospheric delay is frequency-dependent (dispersive); the factor ( fı / f2)? multiplies 7 
for the Lz observations. The group delay is connected to the distances P while the phase 
advance is connected to the phase observations ®. Thus, we see a reversed sign for J in 
(14.20) and (14.21). All observational errors are included in the a and ef! terms. 

The rest of this section deals exclusively with double differenced observations. We 
also omit the subscript and superscripts related to the receivers and satellites, since there 
are exactly two of each: 


Pp=p*+Il-e 

pı = p* -I +N -€ 

P = p* + (fi/ f I — ez 

D2 = p* — (fiı/ fQ I + A2N2 — €2. 


Actually, we have fi/f2 = 154/120 = 1.283333 .... Equation (14.22) is transformed by 
Yang & Goad & Schaffrin (1994) into the elegant matrix equation 


(14.22) 


Py 1 1 0 0 p* €j] 
DP; l —] Ay 0 I €] 

= — 14.23 
P, 1 Af o ollm| le a 
oy) 1 —-(fi/f2)* O As} LNe €2 


When all e and € values are set to zero, we can solve the four equations to find the four un- 
knowns. This determines the ideal pseudorange p*, the instantaneous ionospheric delay J, 
and the ambiguities N; and N2. 

The standard deviation of phase noise € is a few millimeters, while standard deviation 
of the pseudorange error e depends on the quality of the receiver. Lı C/A code pseudo- 
ranges can have noise values up to 2-3 m. This is due to the slow chipping rate (which is 
another term for frequency). The P code has a frequency of 10.23 Mhz, i.e. a sequence of 
10.23 million binary digits or chips per second. The chipping rate of the P code is ten times 
more frequent and this implies a noise level of 10-30cm. A small pseudorange standard 
deviation is critical to quickly determining the ambiguities on L; or L2. 


14.6 Weight Matrix for Differenced Observations 


Suppose we are observing three (or m) satellites at an epoch. We can set up two (or 
m — 1) linearly independent double differenced observations. Since the observations are 
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correlated, the weight matrix is no longer diagonal. Therefore this weight matrix must 
be handled with special care when the normal equations are formed. We now study this 
problem in detail. 

Let the covariance matrix for the original phase observations be Ly, = o¢ I . The 
observations are given with equal weight for phases (and they are independent). The unit 
for the standard deviation ø is length; as an example o = 0.01 m. 

First of all we have the one-way phase difference observations. 


k 
P; 
of 
aa el (14.24) 
®; 
l 
2 
We shall determine the covariance matrix of two distinct single differences: 
k 
D, 
e M 1 0 j ot 
s=} Jl[= J | = D,b,. (14.25) 
I = Į SUS 
k j 0 0—1 1j|ğ; 
p! 


The law of variance propagation (using s to indicate single differences) yields 


2 0 
0 2 


Single differences over a single baseline are uncorrelated. Single differences between dif- 
ferent epochs are uncorrelated, too. However single differences can be correlated when 
more than one baseline is considered and they contain the same one-way phase observa- 
tions (using subscript c to indicate the common receiver j): 


Es = D; Ep D] = of D, D! = 02 | | = 20/1. (14.26) 


k bi 
Di, -1 1 olj; 
= I p k| _ 
c= K | = l 0-1 4 7 = Debs. (14.27) 
j8 
Ps 
Now the covariance matrix is not diagonal: 

Ee = DeEp DI = of D.D! = oå E 2 (14.28) 


Then we compute the covariance for double differences between satellites k, l, and m. 
Satellite k is chosen as reference satellite: 


k 

D; 

k 

; K 

b; 1-1 -1 1 0 Of} 9% 
ee i | = Dabs. 14.2 
: pa k -1 0 0-1 4 p! agi we 

p” 
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The covariance matrix (using subscript d to indicate double difference) is 


4 2 
L= D£ D} = og DaD} = og | A (14.30) 


Again we see a nondiagonal covariance matrix: double difference observations even over 
a single baseline are correlated. 
Finally comes the correlation of triple differences between epochs t and t + 1: 


t=Db=| o o 1 |-1 1 wee 0 0 1 |e. 


0 0 0 O 1 -l -1 1 /-1 1 1 -1 


where now b, involves three epochs (time differences at two epochs): 


bs =| PO OFC) BO) BO ED E+) P+D 
P+D PEHD) PED PEHY O(r+2) |. 


The covariance matrix for the triple difference is 


=A 
E; = D,XpD} = og D, DI = og È d (14.31) 


So we conclude that triple differences are also correlated. Unlike single and double dif- 
ferences, this correlation extends over epochs as well as within a single epoch. 


14.7 Geometry of the Ellipsoid 


Ellipsoidal System 


In geodesy the reference surface is an ellipsoid of revolution. This is a reasonable approx- 
imation to the Earth (but of course not perfect). It is described by the rotation of an ellipse 
around its minor axis (the Z-axis). The ellipse is given by 


a 


This meridian ellipse in Figure 14.3 is a slice taken through the North Pole and the South 
Pole. Often the dimensionless quantities of flattening f or second eccentricity e’ are used 
to describe the shape of the ellipse: 


2 2 
fat zna n + oe (14.33) 
The ratio of the semi axes 1s 
a E LTE E (14.34) 
a /1 + e’? 
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Q = (p, Z) 


B 
LS 


South Pole 


Figure 14.3 The meridian ellipse and the ellipsoid of revolution. 


Finally we often use the radius of curvature c = a?/b at the poles: 


2 
c= = =ay1 +e? =o(1 +e"). (14.35) 


The flattening has a very small value f = 0.003 35. A globe of radius a = 1 m would only 
be flattened by 7 mm. In this case e’ ~ 0.08209. 

A point Q on the ellipsoid is determined by (g, à) = (latitude, longitude). The 
geographical latitude Q is the angle between the normal at Q and the plane of the Equator. 
For an ellipse, the normal at Q does not go through the center point. (This is the price we 
must pay for flattening; g is not the angle from the center of the ellipse.) The geographical 
longitude À is the angle between the plane of the meridian of Q and the plane of a reference 
meridian (through Greenwich). 

If we connect all points on the ellipsoid with equal latitude g, this closed curve is 
g = constant; it is called a parallel. In a similar way all points with equal longitude lie on 
the parameter curve 4 = constant, which is a meridian. Meridians and parallels constitute 
the geographical net (or grid). The meridians are ellipses while the parallels are circles. 
The geographical coordinates can be understood both as angles and as surface coordinates. 


Table 14.3 Parameters for reference ellipsoids used in North America 


Reference Ellipsoid f=(a-b)/a 


Clarke 1866 6378 206.4 | 1/294.9786982 
Geodetic Reference System 1980 | 6378 137 1/298.257222101 
WGS 84 | World Geodetic System 1984 6 378 137 1/298.257223563 
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Geographical longitude is reckoned from the meridian 49 = 0 of Greenwich. A 
surface curve through the point Q makes an angle with the meridian; this angle is called 
the azimuth A of the curve. The azimuth of a parallel is A = 2/2 or 32/2. Normally A is 
computed clockwise from the northern branch of the meridian. 

The reader will remember that it was Ptolemy, looking at the shape of the Mediter- 
ranean Sea, who gave us the words longitude (in the long direction) and latitude (across). 


The Ellipsoidal System Extended to Outer Space 


To determine a point P above the surface of the Earth, we use geographical coordinates 
gy, X and the height h. The height above the ellipsoid is measured along a perpendicular 
line. Let a point Q on the ellipsoid have coordinates g, A. In a geocentric X, Y, Z-system, 
with X-axis at longitude à = 0, the point Q with height h = 0 has coordinates 


X = N cosg cosà 
Y = Neosgsini 
Z = (1 — fN sing. 


The distance to the Z-axis along the normal at Q is N = a / Jvl—-f@2-f) sin? gy. This is 


the radius of curvature in the direction perpendicular to the meridian (the prime vertical). 

The formula for N results from substitution of p = N cos ọ and (14.34) into (14.32). 
When P is above the ellipsoid, we must add to Q a vector of length h along the normal. 
From spatial geographical (g, à, h) we get spatial Cartesian (X, Y, Z): 


X = (N +h) cosg cosà (14.36) 
Y=(N+h)cos@gsina (14.37) 
Z=((1— f) N +h) sing. (14.38) 


Figure 14.4 Conversion between (¢, à, h) and Cartesian (X, Y, Z). 
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Example 14.3 Let P be given by the coordinates (g, À, h) in the WGS 84 system: 


p = 40°07 04.595 51” 
à = 277° 01' 10.221 76” 
h = 231.562m 


We seek the (X, Y, Z) coordinates of P. The result is achieved by the M-file g2c: 


X= 596915.961m 
Y = —4 847 845.536 m 
Z = 4088 158.163 m. 


The reverse problem—compute (9, à, h) from (X, Y, Z)—requires an iteration for 
g and h. Directly à = arctan(Y/X). There is quick convergence for h < N, starting at 
h = 0: 


oe Z 2- fy fNy\-! 


X2 y2 
h from ø (14.36)-(14.37): h= L N. (14.40) 
p 


For large h (or g close to 1/2) we recommend the procedure given in the M-file c2gm. 


Example 14.4 Given the same point as in Example 14.3, the M-file c2gm solves the re- 
verse problem. The result agrees with the original values for (g, A, h) in Example 14.3. 


Using GPS for positioning we have to convert coordinates from Cartesian (X, Y, Z) to 
spatial geographical (Q, A, h) and vice versa as described in (14.39)—(14.40) and (14.36)— 
(14.38). The immediate result of a satellite positioning is a set of X, Y, Z-values which we 
most often want to convert to g, A, h-values. 


14.8 The Direct and Reverse Problems 


A basic GPS output is the coordinates of a single point. Between two points we compute 
distance and azimuth. Or from a known point we calculate “polar coordinates” to another 
given point. 

Traditionally the problem of determining (g2, A2) for a new point, from a known 
(g1, Ay) and distance S$ and azimuth A, has been called the direct problem. The reverse 
problem is to compute distance S and the two azimuths A; and A2 between known points 
with given coordinates (91, 41) and (g2, 42). 

Through ages those two problems have attracted computationally oriented persons. 
We shall only mention a solution given by Gauss. Its limitation is that the distance S should 
be smaller than 50 km. For longer distances we recommend the formulas of Bessel. 
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We quote the basic equations called the mid latitude formulas: 


1 1 +n? —9n?t? 
SsinA = Nloos e(1~ (Using)? + "r B) (14.41) 
1 — 2n? 2(1 = 7 
Scos A = Nb’ cos(!/2)(1 + a (l cos p)? + TE.) (14.42) 
l 1 +n? >» 3+8? > 
AA =Ising(1+ T (Lcos y)* + a b* |. (14.43) 


Here A2 — Ay = l and g — yg, = b and b/V* = b’. Furthermore according to geodetic 
tradition we have set f = tang, n = e’cos@g, V? = 1+7, V = c/N and AA = 
A2— Aj. 

In the direct problem we seek 2 and àz and A2. The mid latitude formulas (14.41)- 
(14.43) can only be solved iteratively for / and b. The correction terms / cos g and / sing 
remain unchanged during the iteration. The changing values for / and b can be determined 
with sufficient accuracy from the main terms outside the parentheses: 

i ZA (1+ --- (sing)? —---b?) 
N coso 
,_ 5$ cosa 
~ N cos(l/2) 


AA =Ising(1 +- -- (Lcos o)? +b’). 


(1 ---- cosy)? — b°) 


In the second iteration g2 and àz have accuracy better than 0.0001” and A2 better than 
0.001”. 

The M-file gauss1 solves the direct problem. Equations (14.41) and (14.42) allow 
for an easy solution of the reverse problem, because they directly yield S sin A and S cos A. 
The M-file gauss2 does the job. 


14.9 Geodetic Reference System 1980 


Table 14.3 lists values for a and f that have been used throughout time to describe the 
reference ellipsoid. For global use the latest values (adopted at the General Assembly of 
the International Union of Geodesy and Geophysics, Canberra 1980) are 

a = 6378 137m 

kM = 3.986005 x 1014 m?/s? 

J2 = 108 263 x 107° 

w = 7.292115 x 10~°rad/s. 
Here kM is the product of the universal constant of attraction and the mass of the Earth. 


The coefficient J2 is given from the spherical harmonic series expansion of the normal 
gravity field of the Earth. The number J is closely related to f and is also named the 
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dynamic shape factor of the Earth. Finally œ denotes the mean rotation rate of the Earth. 
For the flattening we may derive the value 1/f = 298.257 222 101 from these values. 
Hence we have described the surface to which the ellipsoidal height h is related. 


Example 14.5 In Example 14.3 we considered a point with coordinates 
(X, Y, Z) = (596 915.961, —4 847 845.536, 4088 158.163). 
For this terrain point, c2g.m computes (g, à) on Clarke’s ellipsoid of 1866: 


p = 40°07 12.192 79” 
à = 277° 01’ 10.221 77" 
h = 260.754 m. 


We compare with the WGS 84 values given earlier: 


p = 40° 07' 04.595 52” 
à = 277° 01’ 10.221 77” 
h = 231.562m. 


Due to rotational symmetry there is no difference in à. The height h and the latitude g 
are changed substantially and of the same order of magnitude as the change in a, namely 
69.4m. To the accuracy we are using here, the computations in relation to GRS 80 would 
agree with the WGS 84 result. 


14.10 Geoid, Ellipsoid, and Datum 


The geoid is a surface that is defined physically, not geometrically. Contour maps show the 
geoid as an irregular surface with troughs and ripples caused by local mass distribution. 

The center of the geoid coincides with the center of the Earth. It is an equipotential 
surface. One can imagine the geoid if the Earth was totally covered by water. Theoretically 
this sea surface would be at constant potential, since the water would stream if there were 
any difference of height. The actual sea surface deviates a little from this true equipotential 
surface because of variation in sea temperature, salinity, and sea currents. 

The geoid is the reference surface for leveling. In most countries the mean of the 
geoid is fixed to coincide with mean sea level (registrated by mareographs). This does not 
imply that height zero will coincide with mean sea level at the sea shore. In reality there 
can be up to | m in difference between zero level of the leveling network and the mean sea 
level at a given point. The geoid is very irregular and it is impossible to model it exactly. 
Geodesists have spent much time on developing approximations by spherical harmonics. 

At a terrain point P there exists a basic connection between the ellipsoidal height h 
which is a GPS determined quantity, the geoidal undulation N which is the height of the 
geoid above the ellipsoid, and the orthometric height H: 


h=H+N. (14.44) 


In daily life we hardly think about the assumptions made for measuring distances and 
directions. It seems obvious to refer all our measurements to a “horizontal” plane. This 
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Figure 14.5 Ellipsoidal height h, geoid undulation N and orthometric height H. 


plane is perpendicular to the (local) plumb line. However this is no unique description; a 
point just 100 m away has a different tangent plane. This is because the plumb lines at two 
neighboring points converge. They are not parallel. 

Unfortunately, this convergence is not easy to describe by a simple formula. Its 
variation is irregular. But a major part can be separated. This regular part corresponds 
to introducing a reference ellipsoid given by adequate parameters and with a reasonable 
position in relation to the Earth. Note that our original problem was of local nature and we 
now give a solution of global nature. 

Most geodetic observations refer directly to the plumb line, because our instruments 
are set up by means of levels. One exception is electronically measured distances. To start 
we most often reduce the actual observations to the reference ellipsoid. This requires a 
knowledge of the orientation of the ellipsoid in relation to the plumb line. This description 
may be given by the angle between the plumb line at P and the ellipsoidal normal, and 
the distance of P from the ellipsoid. The angle is called the deflection of the vertical. 
Usually it is split into north-south and east-west components € and 7. Finally the geoid 
undulation N between ellipsoid and geoid is related to the distance from the ellipsoid h 
and orthometric height H of the point as given in (14.44). Often N is an unknown quantity 
that has a global variation up to one hundred meters. GPS measurements furnish us with h. 
So if N is known we can determine the height H. That is leveling! 

The relative position between the geoid and the ellipsoid at P is described by three 
quantities: €, 7, and N. If we do not know better they often are put equal to zero. 

Additionally we have to describe the size and shape of the ellipsoid. This requires 
two parameters, usually the semi major axis a and the flattening f. All in all five param- 
eters describe the relation between the rotational ellipsoid and the plumb line at a given 
terrain point P. Such a set of parameters is called a datum. 

Today the three parameters £, 7, N are often replaced by equivalent quantities ty, 
ty, tz. These are translations of the center of the ellipsoid relative to the center of the Earth. 
So modern datum descriptions include the five quantities: a, f, tx, ty, and tz. 

Such a datum can be extended to connected areas of land. But if the geodesist has 
to cross larger areas covered by water, it is impossible by classical methods to transfer 
knowledge about the deflection of the vertical. So a datum can at most cover a continent. 
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However, GPS opens the possibility for merging all continental data. A realization of this 
is the WGS 84 geodetic system, described below. 
To be concrete, we introduce the unit vector u along the plumb line: 


cos È cos A 
u = | cos ® sin A 
sin ® 


where (®, A) denote the astronomical coordinates of the surface point P. Those coordi- 
nates can be determined by classical astronomical observations. 

Any vector ôx from P can be described by its slope distance s, zenith distance z 
and azimuth « (from the northern branch of the meridian). The representation in Cartesian 
coordinates is 


bX] sin œ sinz 
ôx = | ôx2 | =s | cosa sinz 
ÔX3 COS Z 


where 5x, is east, 6x2 is north, and 6x3 is in direction of u. This local topocentric system is 
based on the surface of the Earth (the topography). It can be related to a geocentric system 
with origin at the center of the Earth. This is an Earth centered, Earth fixed system—ECEF 
for short. Now the axes point to the Greenwich meridian at the Equator, to the east, and to 
the conventional terrestrial pole (CTP). 

The transformation from topocentric 6x = Fôx to geocentric 6X is by an orthogonal 
matrix F: 


F = P2R3(a — A)Ro(% — ®). (14.45) 
Explicitly this product of plane rotations gives the unit vectors e, n, u: 


—sinA —sin®cosA cos®cosA 
F= cosA —sin®sinA cos®sinA | = [e n u |. (14.46) 
0 cos ® sin ® 


The inverse transformation is 
ôx = F7'(8X) = F'(5X). (14.47) 


Over and over again equations (14.45) and (14.47) are used to combine GPS and terrestrial 
observations. Note that we use small letters x in the topocentric system (small coordinate 
values around the surface point) and capital letters X in the geocentric system (large co- 
ordinate values around the Earth’s center). In (14.14), we use F with geographical rather 
than astronomical coordinates. We shall now describe this difference. 

To perform computations relative to the ellipsoid, the situation is a little different. 
We repeat the connection between astronomical and geographical coordinates and the def- 
inition of components of the deflection of the vertical: 


=- (14.48) 
n = (A — à) coso. (14.49) 
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Figure 14.6 Zenith distance z and azimuth « in the topocentric system (e, n, u). 


The geographical coordinates (gy, A) and the height h above the ellipsoid can be trans- 
formed to Cartesian coordinates. The new (X, Y, Z) system has origin at the center of the 
ellipsoid, 3-axis coinciding with the polar axis of revolution, 1-axis in the plane of the zero 
meridian, and 2-axis pointing eastwards. According to (14.37) the transformation is 

X =(N+h)cosgcosa 

Y=(N+h)cos@sina (14.50) 

Z=((1— N +h) sing. 


From now on we will use geographical coordinates (Q, i). 


Example 14.6 To emphasize the fundamental impact of the matrix F we shall determine 
the elevation angle for a satellite. The local topocentric system uses three unit vectors 
(e, n, u) = (east, north, up). Those are the columns of F. The vector r between satellite k 
and receiver i is 


r = (X* — Xi, Yf — Y;, Z* — Zi). 


The unit vector in this satellite direction is p = r/||r||. Then Figure 14.6 gives 


T 


p e = sing sinz 
pin = COS & sin z 
p'u =A 


From this we determine a and z. Especially we have sinh = cos z = p? u for the elevation 
angle h. The angle h or rather sin h is an important parameter for any procedure calculating 
the tropospheric delay, cf. the M-file tropo. 

Furthermore the quantity sinh has a decisive role in planning observations: When 
is h larger than 15°, say? Those are the satellites we prefer to use in GPS. Many other 
computations and investigations involve this elevation angle h. 
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14.11 World Geodetic System 1984 


The ellipsoid in WGS 84 is defined through four parameters, with specified variances: 
1 the semi major axis a = 6378 137 m (og = 2m) 


2 the Earth’s gravitational constant (including the mass of the Earth’s atmosphere) 
kM = 3986005 x 108 m?/s? (ogy = 0.6 x 108 m/s?) l 


3 the normalized second degree zonal coefficient of the gravity potential C20 


4 the Earth’s rotational rate œ = 7 292 115 x 107!!rad/s (oe = 15 x 107!! rad/s). 


The International Astronomical Union uses we = 7 292 115.1467 x 107! rad/s, with four 
extra digits, together with a new definition of time. In order to maintain consistency with 
GPS it is necessary to use we instead of w. The speed of light in vacuum is taken as 


c = 299 792 458 m/s with Oc = 1.2 m/s. 


Conceptually WGS 84 is a very special datum as it includes a model for the gravity 
field. The description is given by spherical harmonics up to degree and order 180. This 
adds 32 755 more coefficients to WGS 84 allowing for determination of the global features 
of the geoid. A truncated model (n = m = 18) of the geoid is shown in Figure 14.7. For a 
more detailed description see Department of Defense (1991). 

In North America the transformation from NAD 27 to WGS 84 is given as 


X WGS 84 XNAD27 +9m 
Ywoss4 | = | Ynap27 — 161 m 
ZWGS 84 ZNAD 27 — 179m 


A typical datum transformation into WGS 84 only includes changes in the semi major axis 
of the ellipsoid and its flattening and three translations of the origin of the ellipsoid. 


Figure 14.7 The WGS 84 geoid for n = m = 18. The contour interval is 20 m. 
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WGS 84 is a global datum, allowing us to transform between regions by means of 
GPS. Yet, we finish by a warning about such transformation formulas. They shall only be 
used with caution. They are defined to an accuracy of a few meters, though the mathemat- 
ical relationship is exact. The importance of WGS 84 is undoubtedly to provide a unified 
global datum. 


14.12 Coordinate Changes From Datum Changes 


In geodetic practice it is today common to have coordinates derived from GPS and also 
from traditional terrestrial methods. For computations we convert the local topocentric 
coordinates to geocentric coordinates, which can be compared with the GPS derived coor- 
dinates. Thus physical control points may have two sets of coordinates, one derived from 
classical methods and one derived from GPS. 

The connection between topocentric and geocentric coordinates is established by 
transformation formulas. The most general transformation includes rotations, translations, 
and a change of scale. This type of transformation is only established between Cartesian 
systems. It is described by seven parameters: three translations ty, ty, tz, three rotations 
€x, €y, €z, and a change of scale k. We assume that the transformation is infinitesimal. 
Our starting point is (14.50) which for small changes becomes 


l €z —€y (N + h) cosg cosà X ty 
T=|—ez7 1 ey (N+h)cosysint |+k| Y|+lty |. (4.51) 
Ey —ey 1 (d — PPN +h) sing Z tz 


A first order Taylor expansion of the radius of curvature in the prime vertical (that is the 
direction orthogonal to the meridian) yields 


d- FPN (1—2f + fal + f sin’ g) xal + f sin’ g — 2f). 


Equation (14.51) can be linearized using t = (tx, ty, tz) to 


l Ez —€y (a(1 + f sin? yg) + h) cosg cos À X 
T=|-ez 1 ex || (a+ fsin?g)+h)cosøsinà | +k] Y |+t 
ey —ex 14 | (a+ fsin?y —2f) +h) sing Z 
=RX+kX+t. (14.52) 


To find the linear dependency between the variables we differentiate T: 
dT =RdX+dRX+dkX+kdX + dt. 
The product k dX is of second order. To first order we have R dX ~ dX and thus 
dT =dX+dRX+dkX+dt. 


This yields the corrections di, dy, dh from changes in the size a and shape f of the 
ellipsoid, rotation by the small angles dex, dey, dez, and translations by dtx, dty, dtz. 
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Figure 14.8 Geocentric datum parameters and ellipsoidal coordinates g, À and A. 


This can be done by setting dT = 0. Then a point before and after the transformation 
remains physically the same: 


dX = —d R X —dkX — dt. (14.53) 
The left side dX depends on 4, g, h, a and f. The total differential of X in (14.52) is 


—sinX  —singcosA cosgcosr acoso di 
dX = cosà —singsinA cos@gsina ado 
0 COS o sin o dh 


cos Q cos À sin? p COS H COS À à 
+ | cosøsinà sin? ø cos ø sin À p A = Fa+GB. (14.54) 
sin Q (sin? o — 2) sing 
Substitution into (14.53) gives 
Fæ + GB = —dR X — dk X — dt. 
We look for œ and isolate it—remembering that orthogonality implies F7! = FT: 
a = —F'dRX — dk F'X — F' dt — F'GB. (14.55) 


Now follows a calculation of each component in æ: 


dez Y — dey Z 0 -Z Y dey 
— F'dRX =—F' | —dez X +dexyZ | = -F!| Z 0 =X ||dey 
dey X — dex Y —Y X 0 dez 


—a sing cosà —asingsinX acoso dex 
= asinà —acos À 0 dey |, 
0 0 0 dez 
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0 0 0 
—-F'G=] 0 —singcos’¢ |, and BRIX 10 
—] sin? o a 


This allows us to give an explicit expression for æ: 


acoso dà sin À — cos À 0 dtx 
ado = sing cos À singsinA — coso dty 
dh —cosgcosXk —cosgsinA — sing dtz 


—asing cosà —asingsinX acoso dex 


F asinà —a cos À 0 dey 
0 0 0 dez 
0 0 0 a 
—~|O0}dk-—|0  asingcos*g . (14.56) 
9 df 
a 1 —a sin“ @ 


The number of datum parameters has been augmented by two, namely the changes in the 
ellipsoidal parameters a and f. The total number of parameters is thus nine. How does that 
agree with the fact that a datum is defined by five parameters? Well, we have increased this 
number by three rotations and one change of scale. By that we again come to the number 
nine and our bookkeeping is in balance again. 


Example 14.7 An example of a transformation of the type decribed by (14.51) is the trans- 
formation between European Datum 1950 ED 50 and WGS 84: 


X WGS 84 0.0m 1 a 0 XEp50 — 89.5 m 
Ywoss4 | = | 0.0m | +k | —& 1 0 YEp 50 — 93.8 m 
ZWGS 84 4.5m 0 0 1 ZEp50 — 127.6 m 


The inverse transformation back to the European Datum is 


XED50 1 -a 0 XWGS 84 89.5m 
Yepso }=z|@ 1 0 Ywos 84 +| 93.8m 
ZED 50 0 0 1 ZwWGS 84 — 4.5m 127.6m 


The variables have the following values: œ = 0.156 = 0.756 x 10-6 rad and k = 1 + 
1.2 x 107° or 1/k = 0.999998 8. The transformation looks a little more involved than 
earlier expressions. This is due to inclusion of some minor differences between WGS 84 
and the original Doppler datum NSWC 9Z-2. 


15 


PROCESSING OF GPS DATA 


15.1 Baseline Computation and M-Files 


The previous chapter described the principles of global positioning. This chapter will 
present and discuss a series of MATLAB M-files that are freely available to the reader. 
Their objective is to compute the baseline vector between two Stationary receivers. This 
is the essence of GPS. Large systems like GAMIT and GYPSY and Bernese extend to a 
network of baselines. We believe that the essential problems are best understood by actual 
experiments with relatively simple subroutines. 

The outline of our presentation is given next. It could also be the outline of a lecture 
series or a hands-on GPS workshop. This outline imposes a structure on the 70 or more 
GPS M-files that are associated with this book. The baseline computation is achieved in 
five steps: 


Step 1: The positions of the satellites The satellite positions are to be computed in 
Earth-fixed x, y,z coordinates, from the ephemerides that are transmitted in Keplerian 
coordinates. We are using the raw navigation data files converted to «.*n files in RINEX 
format. New ephemerides will come every 60 to 90 minutes, so there may be multiple 
versions (covering different spans of time) for each satellite. 

This coordinate change (from space to Earth) will be discussed in Section 15.2. The 
Keplerian elements are stable and slowly varying. The x, y, z coordinates change rapidly. 
The applications of GPS need and use Earth coordinates. 


Step 2: Preliminary position of the receiver This is based on pseudorange data from 
each satellite. Recall that the pseudorange is the measurement of travel time from each 
satellite to each receiver, not accounting for clock errors and not highly accurate. The C/A 
code pseudorange has an accuracy of 30 meters. Our recpos code for the receiver positions 
is a simple and straightforward search. This is for pedagogical reasons; an alternative code 
is bancroft. Later they are both replaced by upgraded software which will use further 
information from RINEX observation files (P code distances and phases at the L, and the 
L2 frequencies). 


Step 3: Separate estimation of ambiguities and the baseline vector This is our first 
serious estimate of the baseline (dx, dy, dz) between the two receivers. It is reasonably 
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efficient for short baselines (less than 20 km). We first solve the ambiguity problem to find 
the integers N i j and NŠ! j for double differenced observations. 

In this one instance, we have based the calculations directly on a binary data for- 
mat used by Ashtech receivers. The code produces a number of figures to illustrate the 
residuals—it is good for demonstrational purposes. 


Step 4: Joint estimation of ambiguities and baseline vector Section 15.5 describes 
this main step and it yields high precision. We find the preliminary receiver position more 
accurately with bancroft. Then we compute the least squares solution to the complete 
system of observational equations, taking as unknowns the differences dx, dy, dz and all 
the ambiguities. Naturally this solution does not give integer ambiguities. The LAMBDA 
method presented in Section 15.6 iteratively determines the optimal solution (in integers 
for the ambiguities and in reals for the distances). It was devised by Peter Teunissen, and 
we thank Christian Tiberius for generous help in adding it to our M-files. Our complete 
MATLAB subroutine is not quite state-of-the-art but relatively close. 


Step 5: Updates of ambiguities and baseline with new observations This step is 
often referred to as “Kalman filtering.” Strictly speaking we are doing “sequential least 
squares” on a static problem. New rows are added to the observation matrix, while the set 
of unknowns remains the same. A series of M-files will demonstrate how the preliminary 
receiver position and the ambiguities and the final baseline vector are corrected to account 
for each new set of observations. We indicate several of those M-files by name, using 
the letter k or b to indicate a Kalman or Bayes (covariance before gain matrix) update: 


k_point, b_point, k_ud, k_dd3, k_dd4. 


15.2 Coordinate Changes and Satellite Position 


This section connects Earth centered and Earth fixed (ECEF) coordinates X, Y, Z to a 
satellite position described in space by Keplerian orbit elements. First we recall those 
orbit elements a, e, œw, 82, i, and u, shown in Figure 15.1. This is unavoidably somewhat 
technical; many readers will proceed, assuming ECEF coordinates are found. 

The X-axis points towards the vernal equinox. This is the direction of the intersection 
between equator and ecliptic. For our purpose this direction can be considered fixed. The 
Z-axis coincides with the spin axis of the Earth. The Y-axis is orthogonal to these two 
directions and forms a right-handed coordinate system. 

The orbit plane intersects the Earth equator plane in the nodal line. The direction in 
which the satellite moves from south to north is called the ascending node K. The angle 
between the equator plane and the orbit plane is the inclination i. The angle at the Earth’s 
center C between the X-axis and the ascending node K is called (2; it is a right ascension. 
The angle at C between K and the perigee P is called argument of perigee w; it increases 
counter-clockwise viewed from the positive Z-axis. 

Figure 15.2 shows a coordinate system in the orbital plane with origin at the Earth’s 
center C. The &-axis points to the perigee and the y-axis towards the descending node. 
The ¢-axis is perpendicular to the orbit plane. From Figure 15.2 we read the eccentric 
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Figure 15.1 The Keplerian orbit elements: semi major axis a, eccentricity e, inclina- 
tion of orbit i, right ascension Q of ascending node K, argument of perigee w, and true 
anomaly f. Perigee is denoted P. The center of the Earth is denoted C. 


anomaly E and the true anomaly f. Also immediately we have 


€ = rcos f =acos E — ae =a(cos E — e) 
n =r sin f = basin E = bsin E = ay 1 — e? sin E. 


Hence the position vector r of the satellite with respect to the center of the Earth C is 


E a(cos E — e) 
r=|n|=]|avl—e?sinE |. (15.1) 
t 0 


Simple trigonometry leads to the following expression for the norm 
lr || = a(l — e cos E). (15.2) 


In general E varies with time t while a and e are nearly constant. (There are long and short 
periodic perturbations to e, only short for a.) Recall that ||r|| is the geometric distance 
between satellite S and the Earth center C = (0, 0). 

For later reference we introduce the mean motion n which is the mean angular satel- 
lite velocity. If the period of one revolution of the satellite is T we have 


Se Sf 15.3 
no ee a? 2) 


Let fo be the time the satellite passes perigee, so that u(t) = n(t — fo). Kepler’s famous 
equation relates the mean anomaly u and the eccentric anomaly £E: 


E = u+esmnE. (15.4) 
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Figure 15.2 The elliptic orbit with (€, n) coordinates. The eccentric anomaly E at C. 


From equation (15.1) we finally get 


v1 -— e? sin E 


15.5 
cos E —e ( ) 


f = arctan = arctan 


By this we have connected the true anomaly f, the eccentric anomaly E, and the mean 
anomaly u. These relations are basic for every calculation of a satellite position. 

The six Keplerian orbit elements constitute an important description of the orbit, so 
they are repeated in schematic form in Table 15.1. 

It is important to realize that the orbital plane remains fairly stable in relation to 
the geocentric X, Y, Z-system. In other words: seen from space the orbital plane remains 
fairly fixed in relation to the equator. The Greenwich meridian plane rotates around the 
Earth spin axis in accordance with Greenwich apparent sidereal time (GAST), that is with 
a speed of approximately 24h/day. A GPS satellite performs two revolutions a day in its 
orbit having a speed of 3.87 km/s. 

The geocentric coordinates of satellite k at time t; are given as 


X* (tj) r* cos f* 
j J 
YFC) | = R3(=Q}) Ri (if) R3 (—0f) rf sin ff (15.6) 
Z* (t;) 0 
where r; = ||r(¢) || comes from (15.2) with a, e, and E evaluated for t = t;. However, 


GPS satellites do not follow the presented normal orbit theory. We have to use time depen- 
dent, more accurate orbit values. They come to us as the socalled broadcast ephemerides 
(practically during the downloading of receiver data). We insert those values in a procedure 
given below and finally we get a set of variables to be inserted into (15.6). 

Obviously the vector is time-dependent and one speaks about the ephemeris (plu- 
ral: ephemerides, accent on “phem’) of the satellite. These are the parameter values at a 
specific time. Each satellite transmits its unique ephemeris data. 
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Table 15.1 Keplerian orbit elements: Satellite position 


[mia 
e ees 


apparent system 
ra inclination 


The parameters chosen for description of the actual orbit of a GPS satellite and its 
perturbations are similar to the Keplerian orbital elements. The broadcast ephemerides are 
calculated using the immediate previous part of the orbit and they predict the following part 
of the orbit. The broadcast ephemerides are accurate to ~ 10m only. For some geodetic 
applications better accuracy is needed. One possibility is to obtain post processed precise 
ephemerides which are accurate at dm-level. 

An ephemeris is intended for use from the epoch toe of reference counted in seconds 
of GPS week. It is nominally at the center of the interval over which the ephemeris is 
useful. The broadcast ephemerides are intended for use during this period. However they 
describe the orbit to within the specified accuracy for 1.5-5 hours afterward. They are 
predicted by curve fit to 4—6 hours data. The broadcast ephemerides include 


size and shape of orbit 


the orbital plane in the 


LO, An, e, Ja, N20, io, @, Q, i, Coc, Coss Cre, Cros Cic: Cis, toe 


where & = 892/dt andi = 3i/ðt. The coefficients Ce, C,, and C; correct argument of 
perigee, orbit radius, and orbit inclination due to inevitable perturbations of the theoret- 
ical orbit caused by variations in the Earth’s gravity field, albedo and sun pressure, and 
attraction from sun and moon. 

Given the transmit time ¢ (in GPS system time) the following procedure gives the 
necessary variables to use in (15.6): 


Time elapsed since toe t; = t — toe 
Mean anomaly at time t; Hj = Ho + (y GM/a? + An)ij 
GM = 3.986005 - 10'4 m?/s? 
Iterative solution for E; E; = uj +esin Ej 
True anomaly fj = arctan o a 


cos Ej — e 
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Longitude for ascending node Q; = Qo + (Qo — We )tj — Weloe 
We = 7.292 115 147 - 107” rad/s 
Argument of perigee wj =w + fi + Cyc cos 2(a + fj) + Cos sin2(w + fj) 
Radial distance r; =a(1 — ecos E;) + Crec cos 2(w + fj) + Crs sin2(@ + fj) 
Inclination i; = iọ + it; + Cic cos2(w + fj) + Cis sin2(@ + fj). 


As usual the mean Earth rotation is denoted we. This algorithm is coded as the M-file sat- 
pos. The function calculates the position of any GPS satellite at any time. It is fundamental 
to every position calculation. 

In an attempt to realign WGS 84 with the more accurate terrestrial reference frame 
of the International Earth Rotation Service the ten tracking stations have been recoordi- 
nated at the epoch 1994.0. This redefined frame has been designated WGS 84 (G730); 
G stands for GPS derived and 730 for the GPS week number when these modifications 
were implemented. In WGS 84(G730) the value for GM has been changed to GM = 
3.986 004 418 - 10!4 m?/s?. 

The M-file satposin computes satellite positions in an inertial frame where we = 0. 

All information about the Kepler elements is contained in the ephemerides. Files 
containing broadcast ephemerides are created when downloading the GPS observations to 
a PC. These files most often are in a receiver specific binary format. Fortunately these 
formats can be converted into the RINEX format (described in the back of this book). The 
third character in the extension of such files is always n (for navigation). 

Often we only need part of the information in the navigation file for the ephemeris 
file. The selection and reformatting is done via the following commands: 


rinexe(ephemerisfile, outputfile); 
eph = get_eph(outputtile); 
satp = satpos(t,eph); 


Some comments should be added on timekeeping in the GPS environment. GPS time 
counts in weeks and seconds of week starting on January 6, 1980. Each week has its own 
number. Time within a week is counted in seconds from the beginning at midnight between 
Saturday and Sunday (day 1 of the week). 

The count of seconds goes up to 7 x 24 x 60 x 60 = 604800. GPS calculations 
need to keep track of nanoseconds, so professional software often splits the second into an 
integer and a decimal. 

In the sample code above, satpos needs a time parameter t in seconds of week. All 
epoch times in the RINEX observation files use time in the format: year, month, day, hour, 
minutes, and seconds. So basically we need a conversion from this date-format to GPS 
time (week number and seconds of week). Traditionally geodesists and astronomers solve 
this problem by introducing the (modified) Julian Day Number. This is an elegant way 
to circumvent the problem created by months of various lengths. JD simply counts the 
days consecutively since January 1, 4713 BC. The modified Julian Date MJD, starting at 
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midnight, is defined as 
MJD = JD — 2400 000.5. 
The M-file gps_time finds GPS time (week w and seconds of week sow): 


t = julday(1997,2,10,20); % year, month, day, hour 
[w,sow] = gps_time(t) 
w = 
892 
SOW = 
158400 


To avoid under- or overflow at the beginning or end of a week we use the M-file check t. 


15.3 Receiver Position from Pseudoranges 


We start by a nice and very useful application which combines the least-squares method 
and a searching technique. The procedure solves a least-squares problem without forming 
normal equations. Here we find the minimum of the sum of squared residuals through 
searching. 

Assume four or more pseudoranges P1, P2,..., Pm are given in the same epoch. We 
want to determine the position of our receiver, with no a priori knowledge of its location. 

First we compute the ECEF coordinates of all m satellites from the ephemeris file. 
Then we transform the Cartesian coordinates of each satellite at transmission time (which 
can be obtained from the pseudorange) to (g;, A;) = (latitude, longitude) and average 
those coordinates. This average is our first guess of the receiver location. 

Taking this point as a center we introduce a grid covering a hemisphere. There may 
be ten equal radial subdivisions out to 2/2 and possibly 16 equal subdivisions of the 27 
angle. At each grid point we calculate a residual R; = P; — Pı, i = 2,...,m, as the 
difference between the observed pseudorange and the first value to eliminate the receiver 
clock offset. Next we difference these differences: 


rı = R — RÌ = (P2 — Pi) — (P? — PP) 
r2 = R3 — RÌ = (P3— Pi) — (P3 — PY) 


rm—1 = Rm — Ry = (Pm — Pi) — (Pp — Pr). 


This gives a first approximation to the sum S = J` r Among all the possible values 
for S, we seek the smallest one; the gridpoint connected to this value is the new guess for 
the location of the receiver. The radial grid is subdivided for each iteration. The procedure 
is repeated until no further improvement in position is achieved. 

Finally we show how to calculate the new position by spherical trigonometry, as in 
Figure 15.3. Let the coordinates of the original point be (91, A,). We want to move y 
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Figure 15.3 Spherical triangle for calculation of receiver latitude and longitude (ø, À). 


degrees along the azimuth « to the new point (2, à2). Computing @2 and à2 is a classical 
geodetic problem and the following basic equations yield a solution on the sphere: 
, ; , , sing sin y 
sin g2 = sing; cos Y + cos g1 sin y Cos a and sin Aà = ———————. (15.7) 
COS g2 

These equations give the latitude g2 of the new point and the increment AA in longitude. 
The M-file recpos shows a fast implementation of the code. Eventually one finds a receiver 
position consistent with precision of the orbit, the refraction model, and the pseudoranges. 

This is an example of another look at least squares. Here, since Pı is subtracted 
from every other pseudorange, the differences are obviously correlated. Any error in Pı 
is present in each of the differences. In the search we ignore these correlations since the 
procedure is designed only to give a first guess of the receiver location. 

A least-squares procedure using the proper covariance matrix should follow our 
search technique. The resulting guess is almost always within the (linear) convergence 
region. Obtain the navigation data from rinexe(‘ohiostat.96n’,’rinex_n.dat’). You may 
zoom in on the figure by pressing your mouse button. After many zooms you may read off 
the preliminary position on the x- and y-labels. 

The M-file get_eph opens, reads and reshapes an ephemerides file. Such a file usu- 
ally contains several data sets for a particular satellite. It is common practice to select the 
ephemeris that is immediately before the epoch of use. The file find_eph does this. Keep- 
ing to this practice, however, eventually leads to a change of ephemeris data. This most 
probably will introduce a jump in the calculated orbit. More advanced programs smooth 
orbits, if they have to be exploited over longer periods of time. An even better solution is 
to change to precise ephemerides. 


15.4 Separate Ambiguity and Baseline Estimation 


After downloading the recorded data stored in a GPS-receiver we most often have a file 
containing observations as well as a file containing the ephemerides. Sometimes there is 
also an auxiliary file with meteorological data and information about the site. 
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One procedure is to use these receiver dependent files in the further data processing. 
Another procedure is to convert to the RINEX format. Here we start out from a set of 
receiver dependent files (from Ashtech receivers). 

The M-file bdata reads two observation files, extracts the information necessary, 
and stores it in the binary file bdata.dat. Only data contemporary for master and rover 
are stored; they are receiver time, satellite number, code on L1, phase on Lj, code on L3, 
phase on L2, and elevation angle. Typical calls for binary and ephemeris data are 


bdata(’b0810a94.076’,’b0005a94.076’) and edata(’e0810a94.076’). 


To reformat edata.dat into a matrix with 21 rows and as many columns as there are 
ephemerides make the call eph = get_eph(‘rinex_n.dat’). Probably a given satellite has 
more than one ephemeris. The proper column in the matrix picks the ephemeris just before 
or equal to time. For this column use icol = find_eph(eph,sv,time). 

At the end of the main M-file ash base we need information about the antenna 
heights. This information is contained in the original s-files. They may be converted to 
MATLAB-format by sdata(’s0810a94.076’,’s0005a94.076’). 

The M-file ash dd starts by computing the means of the elevation angles of the 
individual satellites. It counts the number of epochs in which the satellite appears. Then 
a cut-off angle is selected and all satellites with mean elevation smaller than this angle 
are deleted. Next a reference satellite is chosen as the one which appears in most epochs. 
Finally all data are double differenced. The M-file ash_dd is common to several M-files 
using the supplied data set. 


The first step in finding receiver position is to estimate the ambiguities N; and N2. The 
observation equations are 

Pp=p*+I-e 

pı = p*— I +N -&1 

Py = p* + (f/f I- e 

@2 = p* — (f/f I +A2N2 — €2. 


Actually, we have fi /f2 = 77/60 = 1.283 333 .... We assume we are processing a short 
baseline and consequently put Z = 0. So equation (15.8) can be transformed into the 
elegant matrix equation 


(15.8) 


1 0 0 o* Pi 
s : N = - ae. (15.9) 
1 0 A» k 2 


This is four equations in three unknowns: the ideal pseudorange and the two ambiguities 


on Lı and L2. The weight matrix is 
1/0.37 

2 
Ca 1/0.005 ; 
1/0.3 


1 /0.005? 
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The standard deviation of phase error is a few millimeters; for the pseudorange this is 
receiver dependent. C/A code pseudoranges on Lı can have noise values up to 2—3 m. 
This is due to the slow chipping rate which just is another term for frequency. The P code 
has a frequency of 10.23 Mhz, i.e. a sequence of 10.23 million binary digits or chips per 
second. The chipping rate of the P code is ten times more frequent and this implies an error 
of 10-30 cm. 

For each epoch we add the relevant contribution to the normals and only solve the 
system when all observations are read. We use a little more sophisticated code as we 
eliminate the ideal pseudorange according to the method described in Section 11.7. 

The least squares solution consists of two reals nı and nz from which we have to 
recover two integers N; and N2. Here we follow a method indicated by Clyde Goad. The 
estimated difference nı — n2 is rounded to the nearest integer and named K1. The rounded 
value of 60n; — 77n2 we call K2. The best integer estimates for N; and N2 are then found 
as the solution 


N> = (60K, — K2)/17 (15.10) 
N, = No + K}. (15.11) 
The values for Kı and K2 are not free of error, but only particular combinations yield 


integer solutions for N; and N2. Gradually these estimates improve as more epochs are 
processed. The numbers K; and K2, in theory, become more reliable. 


Above we put J = 0. Now we are more familiar with the observables and we can in- 
vestigate this assumption. Let us emphasize once more that we are dealing with double 
differenced observations. We repeat the second and fourth observation equations in (15.8): 


®, =p*-—I+Ai1MN -€ 


D2 = p* — (f/f I + 22N2 — &2. 
Ignoring the error terms and eliminating o* gives an expression for the ionospheric delay 
_ (2 = ANd) - (Dı — AM) 
1 — (fi/f2)? 
This variable is plotted in Figure 15.4. It scatters within a few cm. So it is a matter of 


testing if the condition J = 0 is accepted or not. 
We can eliminate the ionospheric term / from the two equations. The result is 


6001 /A, — 772/2 = 60N, — 77N2. (15.13) 


I (15.12) 


The coefficients 60 and 77 appear because i = in = Tr However, a drawback of the 


large coefficients 60 and 77 is that they amplify the noise. 

The final step is the baseline estimation. In order to operate with correct geometry 
we need good approximate coordinates for the master station. We also need to set prelimi- 
nary values for the baseline vector. Amazingly enough it works to set it to the zero vector. 

We start from equation (14.20): 


p* = p! S p; A p; = on — Te — Aa NE! — noise. (15.14) 
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Epochs for Obs. PRN 26 — PRN 27 [epoch interval 20 s] 


Figure 15.4 Jonospheric delay for double differenced phases from (15.12). 


The wavelength A, and the ambiguities Ng are assumed known; q = 1 refers to fre- 
quency Lı and q = 2 refers to Lz. The M-file tropo models the tropospheric delay T. We 
denote the double differenced phase observations multiplied by the wavelength A, by ®,. 

Our model assumes that the observational data do not suffer from cycle slips or reset 
of receiver clocks. Some receiver manufacturers typically reset clocks by 1 ms to avoid 
clock drifts larger than this amount. Both sources cause code and phase observations to 
violate the condition 


A®(G)=APG) or  @(j)- (1) = PY) — PQ) 


where A means change over time. Apart from the random errors of the pseudorange ob- 
servation this condition should be fulfilled for all values of j with 


P(j) = a Pi (J) + 2 Pa(J) 
| eee; f2 
P(j) = a1 (j) +a22(j)  @ = =O l - 2. 
-f 

For concreteness we take point i (= 810) as a known reference. The coordinates of the 
other point j (= 005) are unknown. The reference satellite is k = 26 and the index / 
runs over five more satellites 2, 9, 16, 23, and 27. Linearizing equation (15.14) brings the 
Jacobian matrix J from the derivatives of the double difference on the left side: 


u? — u% 
u? = u76 Xj 
Jx = | uł — u? | | yj (15.15) 
u? — u* Zj 
27 _ „26 
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where 


k i k i k 


k k k 
PTE Xecer ~ Xi Ygcer -Yi Zecer — Ži l 
Pi Pi Pi 


The original satellite coordinates are rotated by wT. Here we denotes the rotation rate of 
the Earth (7.292 115 147 - 107% rad/s) and t* = p*/c. According to (14.7) we get: 


k 
X ECEF COS(We r ) sin(We t*) 0 x* 
Vee = | —sin(@e i ) COS(We o ) 0 y* (15.16) 
VA: 0 0 1} LzZ* 

ECEF 


On the right side of the linearized equations are the zero-order terms 

chij = Pai — Aa Naij- (of)" a (oi) + (o) E (o). (15.17) 
The index q = 1, 2 corresponds to observations on frequencies Lı and L2. There is an 
equation for each receiver j and satellite 1, except the reference i = 810 and k = 26. 

The least-squares solution of the complete system Jx = c yields the position vector. 
The covariance for both A; and Az is Xg = Dg DI in accordance with (14.30). 

The algorithm yields exciting (and good) results with good P code and phase data. 
Nevertheless we have included a simple test for outliers. In each epoch we ask if the 
weighted square sum of residuals of the observations exceeds a given multiple of the pre- 
vious iteration mean square. If that is the case we ignore data from that epoch. 

The initial estimate x for the baseline from site 810 to site 005 has components 


8X =—941.313m oy =0.002m 
SY = 4496.358m oy =0.001m 
SZ = 112.053m o7=0.002m. 


The observed slope distance of the antenna hs must be reduced through a vertical 
distance hg. A radius r of the antenna yields h2 = h2 — r?. In the present situation 
hs = 1.260 m at site 810, and h; = 1.352 m at site 005. The corresponding values for hg 
are 1.254m and 1.346 m with r = 0.135 m. The vertical distances refer to the topocentric 
coordinate system and it must be transformed into differences in X, Y, and Z. That happens 
by introducing the vertical vector u as defined in equation (10.30): 


cos gcos À 
hau = ha | cosgsind |. (15.18) 
sin o 


Remember that u is the third column of R. The difference in antenna heights contributes 


ÔXa —0.050 
ôYa | = (hm — h,)u = | —0.008 
ÔZa —0.077 
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This has to be added to the estimated difference vector, so the final baseline estimate is 


SX + 6X, —941.263 
XecerF = | 6Y +ôYa | = | 4496.350 
BZ LSZ. 111.976 


This compares closely to the correct vector from site 810 to site 005. That is a fairly 
elementary treatment of positioning, to be improved. 


Tropospheric delay 


A simple empirical model for the tropospheric delay dT of GPS signals received at a 
position with the latitude g, at a zenith distance of z = 90° — h and with air pressure Pp in 
muillibars at height H in km, at temperature Tọ °K and partial pressure of water vapor eo in 
millibars is 


1 +.0.002 6 cos 20 + 0.000 28H 1255 
ar = 0.027 ee (+ ( Jeo) (15.19) 


COS z To vive 

This simple model can been extended in various ways. As we only intend to determine 
short baselines, and to an accuracy of cm-level we have implemented the M-file tropo 
which easily fulfills this demand. We compute the tropospheric zenith delay to be 2.4 m. 
And we read from the formula that the delay grows inversely proportional to cosz. In 
processing GPS observations you often select a cut-off angle of 15°. 

The M-file tropo needs the following parameters: sinel sine of elevation angle of 
satellite, hsta height of station in km, p atmospheric pressure in mb at height hp, tkel 
surface temperature in degrees Kelvin at height htkel, hum humidity in % at height hhum, 
hp height of pressure measurement in km, htkel height of temperature measurement in km, 
and hhum height of humidity measurement in km. Our code may compete even with the 
most precise and recent ones published. 

The zenith delay is known to within about 2% uncertainty or better. For zenith dis- 
tances smaller than 75° this uncertainty does not increase very much. If all pseudodistances 
have a constant bias this would not affect the positon calculation but just add to the reciever 
Clock offset. If the tropospheric delay is not constant there is a tendency that it affects the 
height of the point. Yet, in double differencing for short baselines this is not a serious error. 


Adjustment of Vectors 


When the individual vectors have been processed they have to be tied together through 
a least-squares procedure to establish a network. Each vector estimates a difference of 
Cartesian coordinates between two points. The observation equations are decribed by an 
incidence matrix and therefore are very much like a three-dimensional leveling problem. 
We only need to know coordinates for the fixed points. The M-file v_loops described in 
Example 8.5 identifies all possible loops of three-dimensional vectors. This is useful when 
calculating the closure errors of loops and for a later least-squares estimation of station 
coordinates. 
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The processing of the individual vectors results in a covariance matrix for the three 
components of the vector. All 3 by 3 covariance matrices make up a block diagonal weight 
matrix for the least-squares adjustment. 

Often there are many vectors and sometimes also multiple observations of the same 
vector. The software has to detect and handle gross errors automatically. The most frequent 
errors are wrong antenna heights and wrong point identifications. As a minumium the 
Output must consist of WGS 84 coordinates, a plot of the network possibly containing 
confidence ellipses, and a labelling of non-checked vectors. 

If geoidal heights are available, the output also can be in form of projection coordi- 
nates and heights above sea level. Scale of projection and meridian convergence at newly 
estimated points can be useful. 

In practice it is also very useful if the software incorporates terrestrial observations. 
A few commercial softwares do this. 


15.5 Joint Ambiguity and Baseline Estimation 


Our most comprehensive and flexible code for baseline estimation performs the estimation 
of baseline as well as ambiguities at the same time. The code is activated by the call 
proc_dd(’pta.960’,’ptb.960’) where the files pta.960 and ptb.960 are RINEX observation 
files. We start from files in RINEX format. 

However, as always we have to bring the ephemerides file in order. This is done by 
the call rinexe(‘pta.96n’,’pta.nav’). 

We start by analyzing the header of the observation file from the master receiver: 
anheader(pta.960’). The result is a list composed of some of the following abbreviations: 


L1, L2 Phase measurements on L; and L2 
C1 Pseudorange using C/A Code on Lı 
P1, P2 Pseudorange using P Code on L4, L2 


D1, D2 Doppler frequency on Lı and L2 


Next we open the master observation file and read until we find the first epoch flag equal 
to 0. This indicates static observations (flag 2 means start of moving antenna, and flag 3 
is new site occupation). The file fepoch_0(‘pta.960’) reads continuously epoch times in 
the master file and compares with epoch times in the rover file until they match. When this 
happens we read observations of NoSv satellites by 


grabdata(fid1,NoSv1,NoObs _typest) 
grabdata(fid2, NoSv2, NoObs_types2) 


The observations are transformed into equations which in turn are contributing to the nor- 
mals. Double differenced observations are highly correlated, but the technique described in 
Section 11.8 decorrelates them before adding to the normals. That is the only valid method 
for adding the individual contributions from a single observation. Finally the normals are 
solved to produce estimates for the baseline components and the ambiguities. 
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In this code we deal with the antenna offsets in an untraditional way. A common 
procedure is to compute the baseline between the two antennas and correct for the antenna 
offset at each station. However we introduce a nominal phase center related to the actual 
marker by the offset vector dx = (dH,dE,dN). The vector dx is expressed in topo- 
graphic coordinates but has to be converted to the ECEF system. Next we calculate the 
distance D from the nominal phase center rather than the marker point: 


D= | Xsat — (X marker + dx) ||. 


The distance D has to be subtracted from the calculated pseudorange p;jı when forming 
the right side. Let the observed pseudorange be Pı, and correct the calculated geometric 
distance from satellite j to phase center 1 by D: 


corrected_obsj1 = Pı + (pj1 — D). 


The last step of setting up the observation equations and normal equations is quite 
complex. This is partly due to the fact that we do not want to omit any valid observation. 
In the M-file ash_dd we shaped the observations with brute force as to include only those 
satellites which were observed from the first to the last epoch. Here we want to be more 
flexible and include all observations recorded in the RINEX file in any epochs. 

This flexibility implies that we must be very careful in building the normals. The first 
three unknowns correspond to coordinate increments dx, dy, and dz of the baseline; the 
subsequent unknowns are ambiguities. Each time we encounter a new ambiguity we allo- 
cate a new position and increment the number of unknowns by one. All the necessary book 
keeping is done by the M-file locate. The file accumO adds the individual contributions 
from the observation equations to the normal equations. 


15.6 The LAMBDA Method for Ambiguities 


The letters in LAMBDA stand for Least-squares AMBiguity Decorrelation Adjustment. 
The problem arises in highly accurate positioning by GPS. The carrier phase measurements 
give a very precise value for the fraction (in the number of wavelengths from satellite to 
receiver), but these measurements do not directly yield the integer. For each satellite- 
receiver pair, the problem is to determine this “ambiguity.” Once we find the integer, we 
can generally track it; a cycle slip is large enough to detect. (Recall that the wavelengths 
for the Lı and L2 frequencies are 0.1905 m and 0.2445 m.) The problem is to find the 
integer in the first place. 

Example 15.3 describes the “one-way” method of Euler and Goad, using the M-file 
one_way. In many cases this calculates the ambiguities, quickly and easily. The more 
powerful LAMBDA method was developed by Peter Teunissen (1993—1996) especially 
for networks with long baselines. Its MATLAB implementation has been clearly described 
by de Jonge & Tiberius (1996). We are grateful to Christian Tiberius for his help in adding 
the M-files to the library for this book. 

The information at our disposal is from phase observations. ‘To remove contamina- 
tion by clock errors, we generally form double differences. Each component of the obser- 
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vation vector b involves two receivers i, j and two satellites k, 1. The double difference of 
carrier phase measurements at a specific time (epoch f) is 


bi (t) = (®F@) — i (0) — (e) — O}@). (15.20) 
The vector b contains m measurements, and the vector J contains the n integer unknowns: 
Iie = (If — Ii) — (If — I). (15.21) 


There are other unknowns of great importance! Those are the baseline coordinates X; — Xj 
and Y; — Y; and Z; — Z; that we set out to compute. They are real numbers, not integers, and 
they go into a vector x with p components. For one baseline p = 3. Then the linearized 
double difference equations are 


b= Ax + GI + noise. (15.22) 


A is an m by p matrix and G is m by n. The matrix A relates baselines (the coordinate dif- 
ferences in x) to the phase measurements in b. Thus A involves the geometry of positions 
and lengths, as it does for code observations. The matrix G picks out from each double 
difference bi (t) the contribution hs from the integers. These ambiguities / j are fixed in 
time, nominally at their starting values. The other term Ax accounts for all fractions at the 
start and all phase changes as the observations proceed. 

We suppose that enough observations have been made to determine (usually they 
overdetermine) x and I. Algebraically, the combined matrix [A G ] has full column rank 
p +n. The ordinary normal equations could be solved, but T won’t contain integers. We 
are hoping to achieve good precision from a short series of observations. (But not too short. 
The n observations at one instant are not enough to determine the n integers in I and also 
the baseline coordinates.) 

The covariance matrix X» of the observations is assumed known. Since differences 
are correlated, Lp is not at all a diagonal matrix. This is what makes our problem more 
difficult. This also explains the letter D in LAMBDA, for Decorrelation. The method con- 
sists in decorrelating errors (diagonalizing Xp by a change of variables) as far as possible. 
The limitation is that the change of variables and its inverse must take integers to integers. 

The usual problem in weighted least squares is to minimize ||b — Ax — GI |2. The 
weighting matrix X, | determines the norm, as in lel? = ex, le. The minimization 
gives real numbers x and I (not integers!). This estimate for x and I is called the float 
solution, and it comes from the ordinary normal equations: 


A 


x 
[A G] E; [A G] H = [A G]'=;'b. (15.23) 
We have two block equations, and the left side can be processed first (set 2, eC ): 


. c[A G] aoe ee (15.24) 
GT = | GTCA G™TCG} l 


A triangular factorization comes directly from elimination. The unknowns ¥ are first to 
be eliminated, leaving the reduced normal equations for J. Exactly as in Section 11.7, we 
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are multiplying the upper block row by G'CA(A'CA)7! and subtracting from the lower 
block row. The new coefficient matrix in the (2,2) block is G'C’G, with the reduced 
weight matrix C’ in Section 11.7 and (15.28). Elimination of £ reaches an equation for I: 


G'c'Gi =G'c’'b. (15.25) 


When elimination continues, the matrix on the left is factored into G'C’G = LDLT. Then 
ordinary forward elimination and back substitution yield I. The triangular L (with ones on 
the diagonal) and the diagonal matrix D are actually the (2, 2) blocks in a factorization of 
the complete matrix in (15.24). 

Our problem is that Î is not a vector of integers. This float solution minimizes a 
quadratic over all real vectors, while our problem is really one of integer least squares: 


Minimize (GI —b)'C’(GI—b) over integer vectors I. (15.26) 


The integer solution will be denoted by I and called the final solution. After I is found 
(this is our real problem), the corresponding x will come from back-substitution in (15.23): 


A'CAx = A'Cb— A'CGI. (15.27) 


The right side is known, and ATCA on the left side was already factored at the start of 
elimination. So x is quickly found. 


Integer Least Squares 


We are minimizing a quadratic expression over integer variables. The absolute minimum 
occurs at I’; the best integer vector is J. The coefficient matrix of the second-degree term 
in (15.26) is G'C’G, which is just the (2, 2) block in (15.24) after elimination. For blocks 
A, B, C, D that (2, 2) entry will be Q = D — CA™'B. (In mathematics this is called the 
Schur complement.) With the blocks that appear in (15.24), this matrix Q is 


0=G'cG-—G'CcCA(A'CA)'!A'CG (= G'C'G). (15.28) 


Now consider the minimization for I. The quadratic expression in (15.26) has its 
absolute minimum at I. By adding a constant, the minimum value can be moved to zero. 
Knowing that the coefficient matrix is Q, the problem (15.26) can be stated in a very clear 
and equivalent form: 


Minimize (I — I you — I ) over integer vectors T. (15.29) 


This is the problem we study, and one case is especially simple. If Q is a diagonal matrix, 
the best vector I comes from rounding each component of I to the nearest integer. The 
components are eo when Q is diagonal. The quadratic in (15.29) is purely a sum of 
squares, )| Q;; (I; — I, i)’. The minimum comes by making each term as small as possible. 
So the best I; is the integer nearest to Î;. 

Guo the actual Q may bè far from diagonal. If we could change variables 
at will, we could diagonalize Q. But we are not completely free, since J is restricted to 
integers. A change of variables to J = Z~'1 is only allowed if Z and Z`! are matrices 
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of integers. Then J is integer exactly when I is integer. The transformed quadratic has 
absolute minimum at J = Z~!TJ, and we search for its integer minimum J: 


Minimize (J — J (Z" OZ) — J) over integer vectors J. (15.30) 


The search is easier if Z' QZ is nearly diagonal; its off-diagonal entries should be small. 

We can describe in one paragraph the idea behind the choice of Z in the LAMBDA 
method. “Integer elimination” will be done in the natural order, starting with the first 
row of Q. There will certainly be row exchanges, and you will see why. The essential 
idea was given by Lenstra & Lenstra & Lovász (1982), and the algorithm is sometimes 
called L3. The actual LAMBDA implementation might operate on columns instead of 
rows, and might go nght to left. But to make the following paragraph clear, we just ask 
ourselves how to create near-zeros off the diagonal with integer elimination. 

The first pivot is Q11 and the entry below it is Q21. Normally we multiply the pivot 
row by the ratio /2} = @Q21/ Q11 and subtract from the second row. This produces a Zero in 
the (2, 1) position. Our algorithm chooses instead the integer nz; that is nearest to 1}. 
That choice produces a near-zero in the (2, 1) position, not larger than 5 Q11: 


(Q2 —n21 Qil = l1 Q1 — n2 Qul < 4011. (15.31) 


If elimination continues in this usual order, the entry in each off-diagonal (i, j) position 
becomes not larger than half of the jth pivot dj. Together with the row operations to 
reduce the subdiagonal, we are including the corresponding column operations to reduce 
the superdiagonal. This produces a symmetric Q’ = Z'QZ. The integers n; j yield Z! 
and Z as products of “integer Gauss steps” like 


1 0 so 1 0 
with inverse i 
—n2, | nz, | 


So Z and Z`! are integral, and they are assembled in the usual way. But a crucial point 
is still to be considered: smaller pivots d; lead to smaller off-diagonal entries (< +dj). 
We prefer a row ordering in which the small pivots come first. The pivots are not known 
in advance, so the LAMBDA algorithm exchanges rows when a small pivot appears later. 
Then it recalculates the elimination to achieve (after iteration) the desired ordering. 

A row exchange comes from a permutation matrix. The “decorrelating matrices” Z 
and ZT! are no longer triangular, but they still contain integers. The new form (15.30) of 
the minimization has a more nearly diagonal matrix Q’ = Z'QZ. This greatly reduces 
the number of candidate vectors J. We display here a typical matrix Z from a specific 
calculation (and Z~! also contains integers!): 


= 2:4 1 1 0 
Z=| 3 -3 -l and Zi=/1 1 1 
= © 0 =r 3 


You can see how Q moves significantly toward a diagonal matrix: 


6.290 5.978 0.544 4.476 0.334 0.230 
Q = | 5.978 6.292 2.340 decorrelates to @Q’ = | 0.334 1.146 0.082 
0.544 2.340 6.288 0.230 0.082 0.626 
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The original float ambiguities are 


The pivots = (conditional variances)~! for Q and Q’ are 


0.090 4.310 
5.421 and 1.135 
6.288 0.626 


Notice that LAMBDA reverses the elimination order, up instead of down! The final ambi- 
guities are not rounded values of J, but close: 


The matrix Q’ is more nearly diagonal than Q. The integer least squares problem (= 
shortest lattice vector problem) is easier because the ellipsoids J‘ Q’ J = constant are not 
so elongated. But we still have to search for the J that minimizes (J — J Wee J ). We 
could search in a ball around the float solution J, but the actual LAMBDA implementation 
is more subtle. 

Somewhere the algorithm will factor Q’ into LDL". These factors come from elimi- 
nation, and they indicate search ranges for the different components of J. The off-diagonal 
entries of L reflect correlation that has not been removed. The search ellipsoid around J 
has its volume controlled by the constant c: 


(I-A -=F a(Ji -h +Y luh- D) <e. (15.32) 


de Jonge & Tiberius (1996) search for the components J; in the order n, n — 1, ..., 1. 
When index 7 is reached, a list of possibilities has been created for all Ją with k > i. 
For each of those possibilities, the bound (15.32) allows a finite search interval (probably 
small and possibly empty) of integer candidates J;. When we successfully reach i = 1, 
a complete candidate vector J satisfying (15.32) has been found. The search terminates 
when all candidates are known. 

We chose c large enough to be certain that there is at least one candidate (for ex- 
ample, J rounded to nearest integers). Then there is an efficient recursion to compute 
(J — J)™O'(J — J) for all candidates. 


15.7 Sequential Filter for Absolute Position 


To find the absolute position of a point is a very fundamental problem in positional GPS. 
We already have mentioned several methods to achieve the goal. We shall deal with one 
more method which is described by Bancroft (1985). 
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The implementation is done via filters to demonstrate the effect of each additional 
pseudorange. After three pseudoranges the covariance is expected to be very large. The 
fourth one is crucial because in the absence of other error sources, four pseudoranges 
determine the exact position and time. 

We start by exposing the method in detail. It is implemented in the M-file bancroft. 
As always the raw pseudoranges have to be corrected for the tropospheric delay. This is a 
function of the satellite’s elevation angle, so the correction for tropospheric delay needs at 
least two iterations. (Recently Jin (1996) has introduced a Taylor series expansion of the 
observation equations to avoid iteration.) 

The observation equation of a single pseudorange P* is 


PE = J (XF — X)? + (Yk — Y)? + (Zk — Z)? + cdt. (15.33) 
We substitute the receiver clock offset c dt by b. Move this to the left side, and square both 
sides: 
pk pk —2P*b +b = (X4 — XP 4 (¥* — YP 4 (Zf — ZY 
EK SIFA n EZ e E. 
We rearrange and get 
(X XLT Z Z =P P y= err EZ ZP) 
SAA IY Z ep): 
This expression asks for the Lorentz inner product (which is computed by lorentz): 


1 
(g, h) = g'Mh with M = 
-1 


Using this inner product the equation above becomes 


EHDA e 


Every pseudorange gives rise to an equation of the type (15.34). Four equations are suffi- 
cient to solve for the receiver coordinates (X, Y, Z) and the receiver clock offset b = cdt. 
All our known quantities are in the matrix 


x! y! z! P! 
X? Y? Z P? 
xX rP Z PP 
x4 y* z4 p4 


Here X*, Y*, and Z* denote the geocentric coordinates for the kth satellite at time of 
transmission and P* is the observed pseudorange. The four observation equations are 


r 


a— BMF 


| +Ae=0 (15.35) 
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where e = (1, 1, 1, 1), with 


We solve (15.35) and get 


H = MB`! (Ae + æ). (15.36) 


Since r and b also enter A, we insert (15.36) into (15.35) and use (Mg, Mh) = (g, h): 
(Bole, B™'e)A? +2((B-'e, B'a) — 1)A + (B'a, Bæ) = 0. (15.37) 


This equation is quadratic in A. There are two possible A’s, which give two solutions by 
(15.36). One of these solutions is the correct one. 

Often five or more pseudoranges are observed. We advocate to use all available 
observations for calculating the receiver position. This changes equation (15.35) to a set 
of normal equations, i.e. we multiply to the left with BT: 


Bla — BBM H + BlAe=0. (15.38) 


The same development as above gives (B' B)-'B! = Bt in the equation for A: 
(B*e, Bte)A* +2((Bte, Bta) — 1)A + (Bta, Bta) = 0. (15.39) 


This expression comprises all observations in the sense of least squares. That completes 
the theory, now we turn to the MATLAB implementation b_point. Any comprehensive 
GPS code must be able to calculate a satellite position at a given time. This happens by 
calls of the M-files get_eph, find_eph, and satpos. 

For more precise GPS calculations one needs to correct phase observations for the 
delay through the troposphere. A lot of procedures have been proposed; we implemented 
a method due to Goad & Goodman (1974) as the M-file tropo. The tropospheric delay 
mainly depends on the elevation angle to the satellite. (The M-file tropp contains hints on 
handling graphics of axes, contour labels and lines.) 

The M-file topocent yields a value for the elevation angle EZ of a satellite. This is 
the most important parameter in the function tropo. The function also needs to know the 
receiver’s elevation above sea level, by a call of togeod. 

We transform a topocentric vector x into a local e, n, u coordinate system, with u 
in the direction of the plumb line, n pointing north, and e pointing east. The topocenter is 
given by the geocentric vector X, and the three unit vectors go into the orthogonal matrix 


—sinà — sing cosà cosgcosa 
F=ļ|e n u | < cosà —singsinà cosgsina |. (15.40) 
0 COS o sin o 
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Let (E, N, U) = F'x. Immediately we have azimuth, elevation angle, and length: 


Azimuth: Az = arctan(E/N) 


El = arctan(U/V N? + E?) 


Length: s = |x|. 


Elevation angle: 


The central part of the b_point code uses also togeod and of course the important 


M -file bancroft. The received time Fa S and transmit time tP* are 


tOPS —tp—dtr and t0 


= f; + dt,. 
The pseudorange P is c(tr — tx) with corrections trop and ion: 


P= alne — i3) +cdtr+cdt, + trop + ion = p + c dtr +c dt, + trop + ion. 
Vae, a —— am 
p Bancroft 


The Bancroft model handles the two first terms. Additionally the pseudorange must be 
corrected by the term cdt,. Remember to subtract trop. We set ion = 0. The auxiliary 
code lorentz, called by bancroft, calculates x' Mx = ie + x + a — X. 

The final result from b_point is the very best position one can deduce from a given 
set of pseudoranges. All possible corrections have been taken into account. 

The Bayes filter yields one solution through the final update, but the Kalman filter 
is more illustrative. We can follow the contribution of each individual observation. Of 
course the calculations are similar for both methods. Yet the M-file k_point contains a few 
special calls. One of those is the M-file e_r_corr. It corrects the satellite position for Earth 
rotation during signal travel time: Let we denote the Earth’s rotation rate, x’ the satellite 
position before rotation and x_sat_rot the position after rotation. The rotation matrix is R3 
and the angle is wt = wep/c. Hence x_sat_rot = R3(wt)x°. 

The file normals adds a given contribution to the normals and k_ud performs the 
Kalman update after reading a new observation. 


We conclude with a numerical example. The epoch contains PRN’s 23, 9, 5, 1, 21, and 17. 
The M-file satpos computes the following positions at corrected time of epoch: 


14 177 553.47 
—18 814 768.09 
12 243 866.38 
21119 278.32 


—8 206 488.95 
—18 217 989.14 
17 605 231.99 
20 951 647.38 


15 097 199.81 
—4 636 088.67 
21 326 706.55 
22 527 064.18 


1 399 988.07 
—17 563 734.90 
19705 591.18 
20 155 401.42 


23 460 342.33 
—9 433 518.58 

8 174941.25 
23 674 159.88 


6 995 655.48 
—23 537 808.26 
—9 927 906.48 
24 222 110.91 
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Knowing the satellite positions and the measured pseudoranges we estimate the receiver 
coordinates by the Bancroft procedure. The M-file abs_pos yields the following result 


= 596 889.19 m 
—4 847 827.33 m 
= 4088 207.25 m 
= —5.28 ns. 


&) N) <) ><) 


The value d? is the estimated off-set of the receiver clock compared to GPS time. We find 
it useful to state explicitly the expression for the corrected pseudorange: 


corrected_pseudorange = P +c dtt — trop. (15.41) 


The standard deviation of the receiver position is less than 27m. This calculation is con- 
tinued in a filter setting as Examples 17.7 and 17.8. 


An Alternative Algorithm for Receiver Position 


We mention another method for finding preliminary receiver coordinates (X, Y, Z) from 
four pseudoranges P*, see Kleusberg (1994). The geometrically oriented reader may find 
this method easier to understand. Our point of departure is again the basic equation (15.33): 


PE = V (Xk — X)? + (Yk — Y)? + (Zk — Z)? +cdt, k=1,2,3,4. (15.42) 


We subtract P! from P?, P3, and P4. This eliminates the receiver clock offset dt: 


dı = P! — P! = y (X! — X}? + (Y! — Y)2 + (Z! — Z)? 


— y (X! — XP + (Y! — Y} + (Z! — Z? = p — pı, 
l = 2,3,4. (15.43) 


Once the coordinates (X, Y, Z) have been computed, dt can be determined from (15.42). 

The three quantities d2, d3, d4 are differences between distances to known positions 
of satellites. Points with correct d; lie on one sheet of a hyperboloid (points on the other 
sheet have —d;). Its axis of symmetry is the line between satellites / and 1. The hyper- 
boloids intersect at the receiver position, and normally there exist two solutions. So at the 
end we have to identify which one is the correct solution. 

Let b2, b3, b4 be the known distances from satellite 1 to satellites 2, 3, 4, along unit 
vectors €2, €3, €4. From the cosine law for triangle 1-/-R follows 


pp =b? +p? — 2bipyey + er. (15.44) 
We rewrite (15.43) as p; = dj + pı and square the expression 
pp = d> +p? + 2dip). (15.45) 
Equating (15.44) and (15.45) yields 
b? — d? 


ne 15.46 
dı + bye; - €l ( ) 


2pı 
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Figure 15.5 Four satellites and receiver R. Geometric distance between R and satellite k 
is ox. Inter-satellite distance b; along the unit vector e;, l = 2, 3, 4. 


This is three equations for p1, or equivalently the following equations: 
2 2 2 2 2 2 

T aa a (15.47) 

dı + b2€; +e2 d3 + bze; -e3 d4 + b4€i ° ey 
Now p; is eliminated and the only unknown is the 3 by 1 unit vector e4. 

Some rewritings result in the two scalar equations 
a fm =Um, m = 2,3. (15.48) 

Here we have used the following abbreviations: 


bm bm+1 


= ——— lmn — 
P 7 2 _ g2 
bm di Dnt! dn+1 
Fm 


-Enl 


= ee ( dm+1 dm+1 ) 
m = a) 2 = BD 2 i 
IEn | Dnt) = antl bint] 7 dn 


Sn 


The unit vector fz lies in the plane through satellites 1, 2, 3. This plane is spanned by e2 
and e3. Similarly f3 is in the plane determined by satellites 1, 3, and 4. The two vectors 
f2 and f3 and the right sides u2 and u3 may be computed from the known coordinates of 
the satellites and the measured differences of distances. 

However we want to present a geometric procedure based on vector algebra. Equa- 
tion (15.48) determines the cosine of the two given unit vectors f2 and f3 and the sought 
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unit vector e;. In general the problem has two solutions, one above and one below the 
plane spanned by fz and f3. Incase fo and f3 are parallel their inner product is zero and 
there are infinitly many solutions. 

In principle the solution of (15.48) may be found after a parametrization into spheri- 
cal coordinates of the unit vectors e;, f2, and f3. The two solutions for e; can then follow 
as plane solutions on the unit sphere. However this procedure leads to problems with deter- 
mination of signs. In order to circumvent this problem we proceed from the basic formula 
for double vector products: 


er x (fi x fo) = filer: fo) — falei- fi). (15.49) 


Comparing with (15.48) the scalar products on the right side can immediately be identified 
with the unknowns uv and u3. We substitute A for the right side and g for fı x f2. Then 
(15.49) becomes 


e xg=h. (15.50) 


The two solutions of (15.48) are 
| 
e* = —_(gxhigvg-g—h-h). (15.51) 
8'8 


As long as g - g # 0 this gives two solutions. When fı and f are parallel this product 
equals zero! 

Knowing the solutions e* and e~ we insert into one of the three equations (15.46) 
and get 


2 2 
a bi -di 


i 15.52 
a 2(dı + bief + er) i i 


Now different solution situations may appear. The two distances py and p} may or may 
not both be positive. In case they both are positive we have two points of intersection of 
the hyperboloids: one above and one below the plane spanned by the vectors fı and fo. 
The solution wanted must have a distance of about 6 700 km from the origin. In case one 
distance pF is negative this one can be omitted because we have only one intersection 
point and the solution corresponding to the non existing intersection point has a negative 
denominator which will not be compatibel with equation (15.46). With the correctly iden- 
correct 


tified solution pf{°™°°' from (15.52) and the corresponding unit vector ef we finally get 
the receiver coordinates 


X= X] Oey os (15.53) 


The solution is implemented as the M-file kleus. 


15.8 Additional Useful Filters 


To demonstrate what filters can really do, we start with simple examples relevant to GPS. 
Suppose Pı, ®1, P2, ®2 are double differenced observations between two receivers and 
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two satellites. The observation equation for each epoch is 


l 0 0 x Pi 
Lom ofS a, 
1 0 0 X T| B 
1 O dg p 


This is four equations in three unknowns: the ideal pseudorange p* and the two ambiguities 
on frequencies Lı and L2. The covariance matrix for the observations is 


0.32 
0.0052 
E, = 0.32 (15.54) 


0.0057 
The covariance matrix for the errors € in the state equation is 


10 
2 = 0 
0 


The variance of p* equals 10 m? while the ambiguities have zero variance. The initial value 
for xo is found as a least squares solution of the four equations in the three unknowns at 
epoch 5. The early data are quite noisy due to a cold start of the receiver, with 


10 
Pojo = 10 
10 


The output of the M-file k_dd3 is a plot as well as filtered values for Nj and Nı — N2. 
Of course, the filtered values are not integers. We have added the Goad algorithm for 
rounding to integers as described in equations (15.10) and (15.11). The output indicates 
the effectiveness of this algorithm. For the given data samples the Goad algorithm finds 
the correct ambiguities every time. 


lonospheric delay 


We mentioned earlier that the tropospheric delay has its minumum in direction of zenith. 
In zenith the ionospheric delay can vary form a few meters to many tens of meters. Fortu- 
nately the ionosphere is dispersive: the refraction index depends on the frequency. 

From equation (15.8) we derive a dual frequency ionospheric correction. We repeat 
the two pseudorange observations 


Pj=p*+I-e 
Py = p* + (f/f I — ez. 
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Ignoring the error terms e;, and eliminating the ionospheric delay 7, we get an expression 
for the ideal pseudorange p* freed from Z: 


Pi — Po 
1—£ 
Here £ = (f1/f2)? = (77/60). 


= Pi + 1.545 727 802(P; — P2). (15.55) 


Next we describe a modified version k_dd4 that works with four unknowns: the ideal 
pseudorange p*, the ionospheric delay J (now included), and the integer ambiguities N1 
and N2. We still observe pseudorange and code on both frequencies: 


1 1 0 0 p* P 
1-1 M 0 I |_|], 
1 £ 0 0 Ny P2 ) 
l B 0 do N> D> 


The covariance matrix for Pi, ®1, P2, ®2 is again the Xe described in equation (15.54). 
The initial value for xo is again found as a solution to the four equations in the four un- 
knowns at epoch 5. The early data are quite noisy due to a cold start of the receiver. The 
covariance matrix for the errors € in the state equation is set to 


100 
Lie = 
0 


The standard deviation is 10m for the range p* and 10m for ionospheric delay. The 
ambiguities N; and N2 again have zero variances. 

The output contains the filtered values for o*, I, Ny and the difference d Ny between 
the widelane value Ny = N; — N2 found by k_dd3 and the actual value. The ambiguities 
are evidently more unreliable using this filter than the k_dd3 values. This is due to the 
inclusion of the ionospheric delay as unknown. 

The ionospheric delay can change rapidly in absolute value. Variations depend on 
season, latitude, time of day, and other parameters. Extensive studies of the ionospheric 
delay have been made by Klobuchar, see Klobuchar (1996). 


Example 15.1 (Estimation of receiver clock offset) In GPS surveying, the code observa- 
tions b; (pseudoranges) are often used to estimate the coordinates X, Y, Z of a single point 
and the sequence of receiver clock offsets (changing with time). Suppose that epoch i 
contributes the following linearized observation 


x 
Ai | y | +efcdti = bi —«, i=1,2,...,n. (15.56) 

A 
The matrix A; contains the partial derivatives of the observation with respect to the coor- 


dinates of the point at the epoch i, and e; = (1, 1,..., 1) is a vector of as many ones as 
there are satellites in the epoch. 
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Figure 15.6 Offsets for different receiver types. The clock reset is 1 millisecond. 


We gather the observations from all n epochs into a unified least squares problem 


Aj et cdti bı 

A2 x e? c dt b2 
i y+ : =| . |—e. (15.57) 
l P 

An el | Ledt, bn 


The normal equations are 


ele] el Al et by 
eler el A? . elb? 
: cdt, | = (15.58) 
elen el An x elbn 
Ate, Alez Ale, J% ATA; : Dy ANB 


By ordinary Gauss elimination we subtract multiples of the first n equations from the last 
block row. We write E; for the matrix ej (ejej) lel. The correction (x, y, z) of the pre- 
liminary position (X°, Y?, Z?) is determined by the matrix that appears in the last corner: 


si 
; = (Ea z ajej) Y (aTe; — AT Ejby). 


j=l j=l 


The estimate of the receiver clock offsets c dt; is found by back substitution. 
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Receiver clock offset 


— Batch processing 
- - Kalman filter 
Extended Kalman filter 
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Figure 15.7 Receiver clock offsets computed by batch processing and Kalman filtering. 


The estimation model clearly demonstrates why it is not necessary to collect four 
observations at all epochs for static observations. But a sufficient number of observations 
is needed to Keep (15.58) invertible. If only one observation is available at a particular 
epoch, we can estimate the receiver clock offset, but not the position. 

We recommend to use the described procedure, as some manufacturers introduce 
discontinuous changes in the clock time to keep the offsets within prescribed tolerances. 
Certain receivers have their clocks reset when the offset approaches one millisecond. Fig- 
ure 15.6 demonstrates the jumps in the offset for this receiver type as well as another 
receiver type which has a steered clock. 

Figure 15.7 shows the offset for a steered receiver clock with continual corrections 
to reduce the offset. The plot is made by the M-file recclock. The code iterates three times 
to get the correct receiver position—the clock estimation is linear! 


Extended Kalman Filter 


All state vectors considered so far have been differential corrections dx, to some starting 
value X°. Especially for a position filter the vector X° denotes the preliminary coordinates 
(X?, Y?, Z°) of a (receiver) position. The matrix A collects all the partial derivatives of 
the observations bg with respect to the coordinates. This is all well known. 
Sometimes it is also necessary to keep track of the total state vector x, = X 0 Sx,. 
So we proceed to demonstrate how this is done. Linearize the observation equation: 
0g 


= 0 = 0 TO. — 0 
b=g(X +ôx)+e=g(X )+ ay sea dx +e=g8g(X )+ Gôx +e. 
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For the original state X, the observation is b rather than b — g(X°), and we get 
SR = SRe|e—1 + Ku (be — g(X°) — Gk ôxkik-1). 
The update equation, when we add X : to both sides, is now 
X? + ôk = X? + ôfkjk-1 + Kx (bk — g(X°) — Gy ôXkjk—1) 


A 
Êk = kk- + Ka (be — bkik-1 ` ). 


This is the usual linear update equation, for total rather than incremental quantities. It sim- 
ply says that we correct the a priori estimate by adding the observation residual, weighted 
by Kx. Note that after the correction is made in the extended Kalman filter, the incre- 
ment 6x, is reduced to zero. The prediction is then trivial. The only non-trivial prediction 
is X;|x—1 (which has become the nominal X at tg). This must be done through the nonlinear 
dynamics of Xx)x-1 = g(X° + ¥,). Then we can form the predicted brik- = 9(Xk|K-1)- 
The residual is bg — brik- and we are prepared for the next loop. 


Example 15.2 (Estimation of receiver clock offset by extended Kalman filter) The M-file 
kalclock uses an extended filter. After filtering of all observations in each epoch we do the 
following 


if extended _filter == 
pos(1:3,1) = pos(1:3,1) + x(1:3,1); 
x(1:3,1)= [0; 0; OJ; 

end 

rec_clk_offset = [rec_clk_offset x(4,1)]; 


This code implies that at the end of the first iteration we have an updated position which 
deviates only a small amount from the position computed in the batch run. For the file 
pta.960 the discrepancy in position is (0.12, 0.54, —0.19)m. So this small deviation is 
what we pay for a much faster computation using only one iteration instead of three. The 
result is shown in Figure 15.7. 


Example 15.3 We want to study a single one-way range between a satellite and a receiver. 
We assume that P code pseudoranges and phase observations are taken on both frequencies 
Lı and L2 for 50 epochs. As usual we denote the observations by P1, ®;, P2, and ®2. Un- 
like all earlier instances, the observations are undifferenced! Our goal is to study how well 
P code pseudoranges can help to estimate ambiguities Nj — N2 and N; for undifferenced 
observations. We describe ideas published in Euler & Goad (1991). 

An appropriate filter is a sequential formulation of the Bayes version: 


State prediction: Xktk—1 = FRX 1 )n—-1 + €k (15.59) 
Covariance prediction: Peik—1 = Fk P-1\k-1 FE + Lek (15.60) 
Covariance update: Pik = (Poli + APE, LAR) (15.61) 
Gain matrix (after Px)x!): K; = Pik Af Ue (15.62) 


State update: “kjk = Xklk—-1 + K(b; — AxXk\k—1)- (15.63) 
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A remark is needed on the matrix B i in (15.61). In case the state transition noise 
is large, it becomes difficult to predict the corresponding entry of the state vector. This 
situation is handled well by the predicted covariance matrix Pkjk-1 = | 3, o2 |. Here œo 
denotes one very large variance, or a submatrix with almost infinite diagonal elements. The 
inverse of this (block) matrix has the following form by the identity (17.47): 


=i 0 0 
E 0 oz! l 


This implies that the first entries of the previous state vector x will have no influence on 
the new state. The filter process that we now describe takes advantage of this behavior. 
Our observation equations by = Axx, — ex are identical to (15.30): 


Pi 1 1 0 0 p* 

ı j |1 -1 Àl 0 I l 
P,|~ |i A 0 o || Mis 
p2 1 —(fi/f2)* 0 Ast Lo 


The system equation is the steady model Xk| a. = Filike so we use the filter with 
Fg = I. Hence (15.60) becomes 


Pkik-1 = Pk-1|k-1 + Xek- (15.64) 
The transition covariance matrix Xe x is diagonal. The (3,3) and (4, 4) entries must be 
zero to prevent N; and N2 from changing. We have to allow for large changes of p* and /: 


OO 
Lek = 
0 


Again oo symbolizes a very large but finite number. The update from (15.60) is 
CO 912 | 013 914 
021 
Pkjk-1 = Pk-1ijk-1 + Eek = 
031 
041 


According to the Schur identity (17.47) we get 
0 0 0 0 


1 _|0 0 0 0 
Pkik-1 = O O | 033 034 n 
O O |043 044 


This means that previous information of p* and Z is effectively neglected in the update of 
the new state vector £4. The covariance matrix of the observations is 


2 2 
OP, Op, 


oS, 0.005? 


2 
2 
Tg, 0.005 
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Table 15.2 Standard deviation op for one-way ranges as function of elevation angle h 


0 10 20 30 40 50 60 70 80 90 
4.58 1.73 0.69 0.30 0.16 0.11 0.09 0.08 0.08 0.08 


Uncertainties depending on elevation can be modeled as an exponential expression for the 
standard error op = ao + aje ™™/ ho where ho is a scaled value of the elevation error. 
In Section 16.3 we demonstrate how to estimate ag, a1, and ho. Reasonable values are 
ag = 0.08 m, aj = 4.5m and ho = 10°. This gives op = 0.08 +4.5e—"/!9 in meters and A 
in degrees, tabulated in Table 15.2. To repeat, oa yields the entries (1,1) and (3,3) of Le x. 
Phase measurements are considered to be independent of elevation angle. 

The estimated values for the wide lane ambiguity Ny = N; — N2 were used to form 
double difference ambiguities Ne ij = (NE ; — Mii) — (NG; — Np In all cases the 
computed values were in agreement with the “exact” values. So a good way of estimating 
double difference ambiguities is to start from the one-way ambiguities. The estimates of 
one-way ambiguities are independent of the length of the baseline. These cases can avoid 
a computational need for a LAMBDA method. 

A similar procedure for N; ambiguity shows that the computed double difference 
ambiguities never deviate more than two cycles from the true values. This is a really 
promising procedure. 

From the filter covariance matrix Pyy we can compute the covariance matrix for 
the combinations N, = N; — N2 and N, = N; + N2. The smallest eigenvalue Amin in 
Table 15.3 shows the standard deviation of N; — N2. The smaller it is, the more reliable is 
the computation. Recall that the wide lane has wave length A, = 0.863 m (from 1/Ay = 
1/A,; —1/A2) and the narrow lane wave has A, = 0.107 m (from 1/A, = 1/A,; +1/A2). To 
estimate the standard deviation o,, of the wide lane ambiguity we use oy = /AminAw = 
0.05m. The largest eigenvalue Amax measures the difficulty in estimating N; + N2. A 
similar calculation yields on = ./Amax An = 0.49 m which is more than four times 4,,! This 
explains why it is always more difficult to calculate the narrow lane ambiguity N; + N2. 

The M-file one_way allows the reader to experiment with the enclosed data sets. 


Table 15.3 Eigenvalues of covariance matrix for narrow and wide lane ambiguities 


mean ofh = Amax Amin 


60 0.276 0.00007 
18 15.594 0.003 48 
20 7.424 0.001 66 
20 12.010 0.002 68 
70 0.235 0.00006 
0.000 73 
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Real-time Positioning Using Differential Carrier Phase 


This book does not include the topic of real-time positioning. Yet we cannot resist to 
mention a promising recent development. 

Most published models for computing real-time differential corrections for the rover 
are based on simultaneity. The corrections calculated at the master i are directly transmit- 
ted to the rover j. But this procedure inherits especially a latency problem. 

Lapucha & Barker & Liu (1996) give up the time matching at master and rover. 
To describe their idea we repeat the basic equation (14.16) for a phase observation using 
satellite k: 


OF (t) = pf — IF + TF + c(at*t — tf) — dti(t)) +A (pi (to) — g" (t0)) HANS + e$. 


Double frequency receivers will eliminate the ionospheric delay ag The tropospheric de- 
lay T" can be modeled and largely removed. Anyway the distance between master and 
rover can never extend the distance over which the correction signal can be transmitted. 
This is typically less than 25 km. The tropospheric correction over such distances nearly 
cancels out in double differences. We also assume the ambiguity N 3 to be solved on-the-fly 
(typical duration 15—30 s). 

All information about a receiver position and dynamics is contained in the range pk 
The remaining terms on the right generally are unknown and need to be accounted for in 
the processing. However at the master we can determine a phase correction: 


DE (t) = OF — (pf +c drProatestk (¢ — tf) — cdti). (15.65) 


The approximate master clock offset c dt; (t) is computed using an appropriate algorithm. 
The phase correction OF (t) is thus equivalent to 


Of (t) = cdt™* (rt) — IF + TF + X(gi (to) — o to) + ANF + e. (15.66) 


The term cdt®“-*(t) = cdt*(t — i) = gdp Broadcast,k (, _ tf) represents the unknown 
satellite clock dithering due to SA (selective availability). 

The phase corrections given in (15.65) and (15.66) refer to the past time t = tọ. But 
these corrections have to be extrapolated to the current user time te. Any second order 
extrapolation involves errors that depend on the correction rates and their accelerations. 
The changes in the observed carrier phase are mainly due to the satellite clock dithering 
c dt®“*(t), The orbit and atmosphere errors vary slowly by comparison. 

Experiments show that the correction accelerations can be as large as 0.01 m/s”. Ne- 
glecting these accelerations would cause an error of several centimeters in the extrapola- 
tion. So the second order extrapolation model must be as follows: 


OF (te) = DÉ (to) + ÈE (10) (te — to) + 5 BE ; (to) (te — to)”. (15.67) 


The phase rate ok ; (to) and the acceleration pt ; (to) are estimated from the past observa- 
tions at the master 7. At the rover j one applies the extrapolated corrections (15.67): 


DË (te) = Pi (te) — BE j (te). (15.68) 
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The actual observable used in the rover differential phase positioning filter is the difference 
of the corrected phase observations with respect to a reference satellite k. The observation 
model for the rover is thus derived by combining equations (15.65) and (15.68): 


Di (te) = pi! (te) + Nit (te). (15.69) 
If fo = te the formulation in (15.69) is equivalent to the double difference kinematic model: 
Dij (to) = pij (to) + Nii (to). 


Experiments show that the phase prediction error is on average below 5 cm at a correction 
update interval of 5 seconds. At this rate the positioning accuracy should be maintained 
at the several-centimeter level. This opens the possibility of using slower data links than 
those required for real time kinematic positioning to maintain the same position output rate. 
Applications of differential phase positioning include construction machine guidance and 
high resolution hydrographic surveying, where continuous output with minimum latency 
is a must. 


16 


RANDOM PROCESSES 


16.1 Random Processes in Continuous Time 


So far the order of observations has been of no concern. We collected all the observations 
and by least squares we estimated the parameters in x. However, there are important 
situations where the time for the observation does play a role. The observation made at 
time t is denoted x(t). The sequence of observations taken at times t = t1, t2, ..., ty iS 
denoted x (t1), X(t2),..., X (tn). We are observing (and then estimating) a function of time. 

Unavoidably there are errors in the observations. Each observation is a random vari- 
able and the whole sequence x(t), t = ti, t2, ..., tn is a random process. A process 
is the evolution over time of a dynamic system. We must and shall develop a statistical 
theory for these functions. Classical statistical theory aims to infer the probability law of 
a random variable X from a finite number of independent observations X1, X2, ..., Xn. 
In this chapter we are observing a function that changes with time. We have a probability 
distribution for functions. 

The system consisting of the Earth and a GPS satellite is an example of a dynamic 
system. Their motions are governed by laws that depend only on current relative posi- 
tions and velocities. Such dynamic systems are often modeled by differential equations. 
We solve differential equations to obtain formulas for predicting the future behavior of 
dynamic systems. 

The vector x(t) is the state of the process. The original process b(t) is required to be 
a linear combination of the system variables via b(t) = A(t)x(t) + e(t). Except for this 
error e(t), the process b(t) can be recovered from the model x(t) by a linear combination 
of state variables. 

A linear random process in continuous time with state x(t) and state covariance E (t) 
has the model equations 


X(t) = F(t)x(t) + G(t)e(t) (16.1) 
b(t) = A(t)x(t) + e(t) (16.2) 
ÈM) = FEO) + TOF O + GOUGH). (16.3) 


The observation noise is measured by e(t) and the system noise by e(t) with covari- 
ance 2, (t). Often initial values for the state are given. 
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Example 16.1 (Random ramp) A process with random initial value ag and random 
slope a; may be written as 


b(t) = ao + ait. (16.4) 
The differential equation corresponding to (16.4) is 
b(t)=0 with initial conditions b(0) = ao and b(0) = aj. 


This is a second order differential equation so the state vector x for the process b must 
have two components. The dimension of the state vector equals the number of degrees of 
freedom of the system. Using phase variables x(t) = (x1, x2) = (b(t), b(t)) in the vector 
model leads to 


eanas Eol 
b=[1 0] ied + e(t). 


Frequently random errors exhibit a definite time-growing behavior. A function which 
grows linearly with time can be used to describe them. The growth rate a; is a random 
quantity with a given probability density. Two state elements are necessary to describe the 
model which is called a random ramp: 


XS and x2 = 0. (16.5) 


The state x; is the random ramp process; x2 is an auxiliary variable whose initial condition 
provides the slope of the ramp. The solution of (16.5) is xı (t) = t x2(0). The variance 
of xı is seen to grow quadratically with time. So the covariance matrix is 


222 2 2 2 
to^ to . 2ta~ oa 
2 2 | 5 hence X(t) = l g? 0 | 


to O 


E(t) = l 


We want to check this result by means of equation (16.3) 


È (t) = FOE + EHF O + GOEG) 
0 1][t202 to? to? to? ]fO 0 0] 3 
=f tae a |+( ice ik ol + [0] 220° Hi 
7 to? o? 4 to? 0 _ 2ta~ o? 
Jo 0 o O| | o% ol 
This small computation verifies the validity of (16.3). 


Mean and Correlation 


In analogy with a single random variable we define the mean of an n-dimensional random 
process. The mean is a vector p: 


E{x(t)} =p =| x(t) p(x(t)) dt (16.6) 


—&O 
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or component-wise 


CO 
E{xi()} = wi = | x;(t) p(x(t)) dt, i=1,2,...,n. (16.7) 
—OO 
A random process is called Gaussian or normal if its probability density function is nor- 
mal. The autocorrelation function for a random process x(t) is defined as the expected 
value of the product x (t;)x (t)!: 


Autocorrelation —-Ry(t, 2) = E{x(t1)x(t2)"} (16.8) 


where f; and fo are arbitrary observation times. Sometimes you find the correlation prop- 
erties of a random process described by means of the autocovariance function which is 
defined as E{(x(t1) — w(t) (x(t2) — u(t) }. The two functions are obviously related. 
The mean is included in the autocorrelation and the mean is subtracted in the autocovari- 
ance. The two functions are identical for processes with zero mean. 

The autocorrelation tells how well the process is correlated with itself at two different 
times. A rapidly decreasing autocorrelation function has a “short memory” and allows the 
process to jump. A function with “long memory” entails a more smooth process. 


Stationarity 


A random process is stationary if the density functions p(x (t)) describing the process are 
invariant under translation of time. This means that 


p(x(t1)) = p(x +4). 


In this stationary case the autocorrelation function depends only on the time difference 
T = t2 — tı. Thus Ry reduces to a function of just one variable T: 


Stationary autocorrelation Rolt) =E {x(t)x(t + aap (16.9) 


Stationarity assures us that the expectation does not depend separately on ti = £t and 
t2 = t + T, but only on the difference t. 


Example 16.2 (Random walk) The process is the result of integrating uncorrelated sig- 
nals. The name indicates fixed-length steps in arbitrary directions. When the number of 
steps n is large and the individual steps are short, the distance travelled in a particular 
direction resembles the random walk process. 

The random walk then is described as x(t,) = l +l2 +---+/,. By linearity the 
mean value is E{x(t)} = 0. Let each l; have variance o? = 1. Independent steps yield 


Var(x(tn)) =1+1+- -+1 =n. 


For n —> ov the variance of x tends to œo; so the process is not stationary! For the 
autocorrelation we have 


to fort, > fo 


16. 
ti fort, <h. (1020) 


t2 t 
Rx (ti, t2) = E{x(ti)x(t2)} = | | ne — 0) dtdo = | 
0 0 
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The continous random walk process is also known as the Wiener process. The Wiener 
process defined by (16.10) is often taken as the definition of the white noise process. So 
Gaussian white noise is that hypothetical process, when integrated, which yields a Wiener 
process. See also Example 16.6. 


Cross-correlation 


Suppose we have two random processes x(t) and y(t). Their cross-correlation function 
gives the expected values of all the products x; (t1)y;(t2). In matrix form: 


Ruy (th, 12) = E{x(t) y(t2)"}. (16.11) 


If both processes are stationary, only the time difference t = t — tı between sample points 
is relevant. We again have a (matrix) function of t alone: 


Rxy(t) = E{x(t)y(t +1)'}. (16.12) 


The cross-correlation function gives information on the mutual correlation between the two 
processes. Note that Ryy(t) = Ryx(—t). The sum of two stationary random processes 
z(t) = x(t) + y(t) has an autocorrelation function defined as 
T 
R(t) = Ef (x(t) + y@))(x(t+7)+y@+r)) } 
= E{x(t)x(t+t)"}+E{x(yet+r) }+£{y@xe+r)'} +E yya") 
= R(T) + Ryy(T) + Ryx (tT) + Ry(T). (16.13) 


If x and y are uncorrelated processes with zero mean, then R,y = 0 and Ry, = 0 and 
R(t) = Ry(t) + Ry (1). (16.14) 


The sum can be extended to more stationary and uncorrelated processes. 
We summarize some properties for stationary processes with zero mean: 


1 Rx (0) is the mean-square value of the process x (t). 
2 Rx(T) is an even function: R(t) = Rx(—t). This follows from stationarity. 


3 |Rx(t)| < Rx(0) for all t. The mean-square value of x(t) equals that of x(t + T). 
The correlation coefficient between these two random variables is never greater than 
unity, by the Schwarz inequality. 


4 If x(t) contains no periodic component, then R,(t) tends to zero as t —> oo. There- 
fore x(t + T) becomes uncorrelated with x(t) for large t, provided there are no hid- 
den periodicities in the process. Note that a constant is a special case of a periodic 
function. Thus R, (c0) = 0 implies zero mean for the process. 
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5 The Fourier transform of R,(t) is real, symmetric, and nonnegative. The symmetry 

follows directly from R,(t) = R,(—t). The nonnegative property is not obvious at 

‘this point. It will be justified in the section on the spectral density function for the 
process. 


The autocorrelation function is an important descriptor of a random process and is rela- 
tively easy to obtain. Often the autocorrelation is all we know! Since there is always a 
Gaussian random process that has this given autocorrelation, we generally assume that our 
process is Gaussian. This uses the information we have (which is R(t)), and it makes the 
optimal estimators linear. 


Ergodicity 


In order to understand ergodicity we have to focus on averaging. In equation (9.24) we 
introduced a sample average as 


? X1, + Xo+---+Xn 
N 
This is the arithmetic mean. Note that X1, X2,..., Xy are numbers. The ensemble aver- 
age is the expectation of X. Itis not an average X of an observed set of numbers. Finally 
a time average is defined as 


T 
R(t) = lim £ x(t)x(t + 1)! dt. (16.15) 
Too 2T J_r 

We want to stress that X and E (£ } are estimators of the mean p while R(t) is a time 
average. The two types of average estimate different objects. 

A random process is ergodic if time averaging is equivalent to ensemble averaging. 
This implies that a single sample time signal of the process contains all possible statistical 
variations of the process. Hence observations from several sample times do not give more 
information than is obtained from a single sample time signal. 

Note that the autocorrelation function is the expectation of the product of x(t;) 
and x (t2). It can formally be written as 


Rx(t, t2) = Efx(ti)x(t2)'} = | J x(t1)x t2)" Pxyxy (Œ (t1), x(2)) dti dtp. 
(16.16) 


However (16.16) is often not the simplest way of determining R, because the joint density 
function Px, x, (x1, x2) must be known explicitly in order to evaluate the integral. If ergod- 
icity applies it is often easier to compute R, as a time average rather than as an ensemble 
average. 


Example 16.3 We want to study the concept of ergodicity for a stationary process which 
is the ensemble of sinusoids of given amplitude A and frequency f and with a uniform 
distribution of phase g. All ensemble members are of the form 


x(t) = Asin(ft+@). 
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The phase ø is uniformly distributed over the interval (0, 277). Any average taken over the 
members of this ensemble at any fixed time would find all phase angles represented with 
equal probability density. The same is true of an average over all time on any one member. 

The density function is py,x, = z- forO < ø < 2r. According to (16.16) the 
ensemble average autocorrelation function is 


20 
y(t) = | Asin(ft+q@)Asin(ft + ft +o) dg 
0 


AZ 2x 


re : (cos( fT) —cos(2ft+ ft + 29)) dg = 4A? cos( fT). 


The time average autocorrelation function according to (16.15) is 


1 T 
R(t) = lim oz | Asin(ft+)Asin(ft+ ft +ọ)dt 
T>oo 2T ST 


1 pT 
= 5A’ tim 5T [costs —cos(2ft+ ft + 29))dt= 4 A* cos( ft). 


The two results are equal, thus x(t) is an ergodic process. 
Note that any distribution of the phase angle g other than the uniform distribution 
over an integral number of cycles would define a nonergodic process. 


Power Spectral Density 
The Fourier transform of the autocorrelation function 


S,(@) = | Rx(t)e /° dt (16.17) 


is called the power spectral density function or power density spectrum of the random 
process x(t). The variable w denotes the frequency in Hertz (cycles per second). 


Example 16.4 We shall study a Gauss-Markov process described by the autocorrelation 
function 


Rx (t) = oe", (16.18) 


We calculate its power spectral density in two parts t < 0 and t > 0: 


67 eS, (16.19) 
a—-jJw a+Jo 


z i £ l a 7a—-jwo+a+ jo 20a 
a2 + w2 a2 + w2 


The M-file gmproc allows the reader to experiment with the shape of R(t) and S, (w) 
for various values of o and a. The functions R,(t) and S,(q@) are shown in Figure 16.1 
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Figure 16.1 Autocorrelation R,(t) and spectral density function S,(@) for a Gauss- 
Markov process with o = 1 anda = 1. 


0 5 10 


foro = 1 and g = 1. The correlation time is 1/a. Many physical phenomena are well 
described by a Gauss-Markov process. 
The process may also be characterized as 


x(t +r) =e x(t) + e. 
We assume the noise €; is normally distributed with zero mean and calculate the variance: 
E{e2} = E{(x@¢-+r)—e7@!*!x(1))7} = Ef (x(¢+1)?—2e 2!" |x (t+ 7) x(t) +e!" x (1)7) } 
= 0? — 207% Ne Atl 4 o?e ltl — o7(1 — gr Pel), (16.20) 


We assumed that e; is independent of x(t). This process is called an autoregression. 
If {eg} are i.i.d. and we sample at k = 1, 2,3,... the process is described as 


Xk = PXk-1 + €k (16.21) 
where p = e ~ when the sampling interval is t = 1. 


The inverse transform of the spectral density S,(@) reconstructs the autocorrelation: 


R(t) = a | ~ Sx (œe da. (16.22) 
20 Je 
For t = 0 this is 
Rx (0) = E{x*(t)} = =I Sx (w) dw. (16.23) 


As R,(t) = Rx(—T) we also get Sx (œ) = Sx(—w), so the power spectral density function 


is a symmetric function in w. 

Example 16.5 (Gauss-Markov process) We consider the spectral function (16.19) 
S,(@) = a 

Q4 + w 


Using (16.23) we should recover oĉ. So we perform the integration: 


Oo 9 2 2 1 oO 
E{x?} = = | dos | aa. =o’, 
2m Jo X? +w? nx |a@ a/\_o 


Taking the inverse Fourier transform of the Fourier transform we have recovered the origi- 
nal function. 
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Figure 16.2 Autocorrelation R,(t) and spectral density function S, (œw) for bandlimited 
white noise. 


We should like to comment on the interpretation of the power spectral density. White noise 
is similar to the buzz in the radio and it contains all tones (frequencies). Its power spectral 
density function is S,(@) = constant. On the contrary the power spectral density of a pure 
cosine waveform has the delta function (actually two delta functions, at +w and —q@) as 
power spectral density. The spectral density reflects the composition of frequencies in the 
signal x(t). The following example discusses the subject in a little more detail. 


Example 16.6 Which autocorrelation function corresponds to a constant power spectral 
density So from —co to œo? The power is distributed uniformly over all frequencies. In 
analogy to the case of white light such a random process is called white noise: 


OO 
R,(t) = = | Soe TIe dw = Sy 8(t) (informally). 
2n Ja 
At t = 0, the Dirac delta ô(r) yields R (0) = So 5(0) = œo. This is E{(x())*}. Thus 
white noise is an idealized process. 

Another characteristic of sound is the bandwidth. Often the bandwidth of the noise 
is made wide compared to the bandwidth of the system. We define a bandlimited white 
noise as constant in a finite range of frequencies and zero elsewhere: 


A for |@| < 2r W 


oe k for |w| > 2x W. 


W is the physical bandwidth in Hertz. The autocorrelation of S(w) is the inverse transform 
of this box function, which produces the sine function sinx 
sin(2r Wr) 

2nWr ` 
This is not bandlimited. It is impossible for both S (w) and R(w) to be supported on finite 
intervals. Heisenberg’s Uncertainty Principle, see page 225, gives a precise limitation 
OSOR = 5 on their variances. 

Both the autocorrelation and spectral density functions for bandlimited white noise 
are sketched in Figure 16.2. The function R,(t) is zero for tT = ziy, ar si .... Thus if 
the process is sampled at the Nyquist rate of 2W samples/sec, the resulting discrete random 
variables are uncorrelated. Since this usually simplifies the analysis, the white bandlimited 
assumption is frequently made. 


R(t) =2WA 
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Table 16.1 System models of continuous random processes 


Autocorrelation R,(t) | Power spectral density S,(w) 


White noise o? 8(t) o* (constant) 
Random walk? undefined x o? 


Random constant o? 2ra? slw) 
2070 
w2+a2 


Exponentially correlated | o7e~“!"!, where 1/a = 


or Gauss-Markov correlation time 


“Not a stationary process, hence undefined Ry (T). 


16.2 Random Processes in Discrete Time 


Chapter 9 started with continuous random variables. In the present chapter we proceeded 
similarly. However in geodesy most applications involve discrete time and we must now 
discretize the time parameter. A discrete-time linear random process replaces the differen- 
tial equation x(t) = F(t)x(t) — G(t)e(t) for the state x(t) by a difference equation for the 
state x, = x(k): 

Xk = Fk-1Xk-1 + Grex 


(16.24) 
by = AgrXk + ek. 


Suppose the uncorrelated process noise €% in the state equation has covariance ma- 
trix Lex. If the observation noise eg is uncorrelated and with zero mean, we have by 
covariance propagation the following recursion formula for the covariance of xz: 


Er = Fes Dy Fl} + Ge De kG}. (16.25) 


However, the model (16.24) allows for time correlation in the random process noise ez. 
Such correlation often occurs in models from practice. It can be handled correctly by an 
augmentation of the state vector x. Suppose €; can be split into correlated quantities €1 4 
and uncorrelated quantities €2 4: € = €1,k + €2,k. We suppose that €; can be modelled 
as a difference equation 

€1,k = Ge€1,k—1 + ©3,k-1 


where €3 is a vector of uncorrelated noises. Then the augmented state vector x, is given by 


/ | | 
X; = 
€i k 


and the augmented state equation, driven only by uncorrelated disturbances, is 


F G TA G O|] 
. by O Ge || €1,4-1 O TJUE k-i i ) 
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Next we consider four specific correlation models for system disturbances. In each 
case scalar descriptions are presented. 


Example 16.7 (Random constant) The random constant is a non-dynamic quantity with a 
fixed random amplitude. The process is described by 


Xk = Xk-1.- 
The random constant may have a random initial condition xo. 


Example 16.8 (Random walk) The process is described by 
Xk = Xk-1 + Ek. 
The variance of the noise is 
Efez} = E| (xx — xk—1)”} = Efx?) $ E{x;_,} — 2E [xkxk-1} = 202. 


Example 16.9 (Random ramp) The random ramp is a process growing linearly with time. 
The growth rate of the random ramp is a random quantity with given variance. We need 
two state elements to decribe the random ramp: 


Xi,k = X1 k—1 + (fk — tk—1)X2,k—1 + €1,k 


X2,k = X2,k—1 +t €2,k- 
Example 16.10 (Exponentially correlated random variable) 
Xk = ehh yy ep 


We have eg = xg — e %%—-%-1) The time difference is At = tg — th-1. According to 
(16.20) we have E{e?} = o?(1 — e224), 


The random processes in Examples 16.7—16.10 make the basis for a lot of linear filters. 
Next we bring three examples related to GPS applications, see Axelrad & Brown (1996). 


Example 16.11 (Discrete random ramp) Often random errors exhibit a definite time- 
growing behavior. The discrete random ramp, a function that grows linearly with time, 
can often be used to describe them. The growth rate of the random ramp is a random quan- 
tity with a given variance. A good example of this model is the behavior of the offset b and 
the drift d of a GPS receiver clock. 

Two state components are necessary to describe the random ramp. So we use a 
vector xg and a matrix equation: 


State equation x, = Fxy-1 +e, with x= H and F= . ra 
k 


(16.27) 


The offset b is the random ramp process. The drift d describes the slope of the ramp. The 
second row of F gives random changes of slope from d;_; to dg. The first row gives the 
random ramp: 


by = by_1 + At dg_1 + random error. 
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Next we shall estimate the covariance matrix of observation errors ©, = E{ee'}. 
We start with a continuous system formulation, and integrate over a time step: 


tk 
€k =) F (tk, t) €(t) dt 
tk—1 


which yields 
tk tk 
Elere} = ri f | F (te, tT) €(t) Elo)! F(th, 0) drda | 
tk] Y fk—ı 


t 
s| F (te, T) X(t) F(t, t)! dt. 
tk-} 


The matrix Ł (t) is the spectral density matrix. Let the spectral amplitudes for the offset 
and drift be sy and sy: 


‘Then the integrand is a 2 by 2 matrix: 


ie i + sat? i 


SdT Sd 


Therefore we get a formula for the covariance matrix: 


tk 2 spAt + sa(At)?/3 sq(At)*/2 
alen =f — al Pere b a(At) /3 sq(At)*/ 
k-] 


. (16.28) 
SdT Sd sa (At)? /2 sq At 


Typical values for the white noise spectral amplitudes sp and sg in GPS receiver clocks are 
4 x 107}? and 15 x 107!9. A typical time step is At = 20s. In this case the covariance 
matrix is clock: 
40004 300 
a Ti —19 
Example 16.12 A process model for a GPS receiver includes the three coordinates of the 
receiver position combined with the clock offset and the clock drift. The dynamic model 
is still given by (16.27) and the state vector x has five components: 


CS and F= 


Nel rn X 
-Piooo 
= 


0 
0 
0 
1 
0 


The covariance matrix for this static receiver is 
X position 0 | 
0 Uelock 


Estati = E {€€} = (16.29) 
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The matrix Èclock reflects the random contribution from the receiver clock. The covariance 
matrix Uposition describes the model noise related to the position. When the receiver is kept 
at a fixed position (static receiver) it would be natural to set position = 0. This however 
would imply that all new position information is ignored and that is not meaningful. So 
“artificially” we let the position have small variances in order that the filter does not get 
stuck. 


Example 16.13 A kinematic receiver is a GPS receiver that is moved around. Often it 
goes on board a vehicle with low velocity and without sudden shifts. Now the state vector 
includes eight components: three coordinates, three velocities, and two clock terms: 


x 1 0O 0|At 0O Oj} O 

y O 1 0/;0 At 0} 0 

4 0 0 1/0 0 At} 0 

bt = and P= ve k : 

y 0 0 0 0 

z 0 0 O 0 

b 0 0 O 

d 0 0 O 0 

The covariance matrix is 
X position Z position, velocity 0 
Dpinematio = Elé el} = Ži position, velocity Lvelocity 0 : (16.30) 
0 0 X clock 


The matrix Yyelocity often uses different values for the horizontal and vertical components. 
A car does not substantially change its vertical velocity. But it can accelerate or decelerate 
rapidly. Of course if the variances in the diagonal terms of Xkinematic are large, a filtering 
process—as described in the next chapter—will not improve the accuracy of the position 
very much. 


Example 16.14 (Gauss-Markov process) Let x, be a stationary random process with zero 
mean and exponentially decreasing autocorrelation: 


Rx (to — ti) = oele], 


This type of random process can be modeled as the output of a linear system, when the 
input €g is zero-mean white noise with power spectral density equal to unity. (In stan- 
dard time series literature this is called an AR(1) model. AR(1) means autoregressive of 
order 1.) A difference equation model for this type of process is 

Xp = Fxk-1 + Gek 


16.31 
Dye Xe. 


In order to use this model we need to solve for the unknown scalar parameters F and G as 
functions of the parameter œ. To do so we multiply (16.31) by x,_1 on both sides and take 
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Table 16.2 System models of discrete-time random processes 


Autocorrelation Ry State-space model 


Random constant ye S= a o*{xo} =o” 


Random walk Xk = Xk—1 + Ek, o {xo} = 0 


Random ramp X1 k = X1, k-1 + Atx2 k1 
X2,k = X2,k-1 
E iall — ~2,—alAt | — ,—a|At| 
xponentially R, (At) = oe XE =e Xk—1 + €k 


correlated o*{xo} =o*, Atk = tg — tk] 


expected values to obtain the equations 
E{xkxk-1} = FE{xk-1xk-1} + GE{ekxk-1} 
o?e = Fo? (16.32) 
assuming that the eg are uncorrelated and E {eg} = 0 so that E{exx,} = 0. The transition 


matrix (factor) is F = e~*. Next square the state variable defined by (16.31) and take its 
expected value: 


E{xz} = F’E{xk-1xk-1} + G Elexex} 
o? = 0? F? +G? (16.33) 


because the system variance is E {eZ} = 1. We insert F = e™ into (16.33) and get 
G = o4 1 — e7?*. The complete model is then 


xk =e (xp_-1 tov 1 — e7 er 

bk = Xk 
with E {eg} = 0 and E{eze;} = ôjk. 
Ideally a random process should be based on the physical laws that govern the noise of the 
system errors. An exact representation is often impossible, either because the underlying 
physics is not well understood or because implementing the ideal random process would 


yield a cumbersome solution. The Gauss-Markov model (exponential decay in correlation) 
is extremely useful, requiring only one parameter a. 
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In applied works it only rarely happens that the given physical problem is in the exact form 


Xk = Fk—1Xk—1 + €k 
bg = Agxy + €k. 
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Simulated random walk process 


State variable 


o Observations 

— Random walk process 
Filter for correct model 

- - Filter for incorrect model 


0 20 40 60 80 100 


Figure 16.3 Correctly and incorrectly filtered random walk process. 


Most often the original problem has to be modified to fit into the appropriate form. This 
twisting around is often referred to as modeling and it is all-important in Kalman filtering 
applications. Good modeling leads to good results; bad modeling leads to bad results. It is 
as simple as that. There are no set rules for the modeling procedure, and it often requires 
some imagination. Perhaps the best way to become adept at modeling is to look at a wide 
variety of examples. 

We start by an example from Brown & Hwang (1997) demonstrating the effects of 
mis-modeling. 
Example 16.15 Consider a process that is actually random walk but is incorrectly mod- 
eled as arandom constant c. We have then for the true model 


Xk = Xk—-1 + €k, ekz = unity Gaussian white noise, o? {xo} ss 


bk = Xk + €k, k=0,1,2,... o*{e,}=0.1 
and for the incorrect model 


te =C, c~ N(0,1) 
bk = Xk + ek, k=0,1,2,...  oĉ{e;,}=0.1. 


The wrong model has Fy = 1, Zek = 0, Ak = 1, Xek = 0.1, £o = 0, and BS = |. For 
the true model the parameters are the same except that Xe x = 1, rather than zero. 

The random walk xo, x1, ... was simulated using Gaussian random numbers with 
zero mean and unity variance. The resulting process for 100 seconds is shown in Fig- 
ure 16.3. A measurement sequence bg of this sample process was also generated using 
another set of N (0, 1) random numbers for eg. This measurement sequence was first pro- 
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cessed using the incorrect model (£<, = 0), and again with the correct model (Xi. k = 1). 
The results are shown along with the sample process in Figure 16.3. In this case the mea- 
surement noise is relatively small (o ~ 0.3), and we note that the estimate of the modeled 
filter does very poorly after the first few steps. This is due to the filter’s gain decreasing 
with each succeeding step. At the 100th step the gain is more than two orders of magnitude 
less than at the beginning. Thus the filter becomes very sluggish and will not follow the 
random walk. Had the simulation been allowed to go further on, it would have become 
even more sluggish. 


The moral to Example 16.15 is simply this. Any model that assumes the process, or any 
facet of the process, to be absolutely constant forever and ever is a risky model. In the 
physical world, very few things remain absolutely constant. The obvious remedy for this 
type of divergence problem is always to insert some process noise into each of the state 
variables. Do this even at the risk of some degree of suboptimality; it makes for a much 
safer filter than otherwise. It also helps with potential roundoff problems. Often a random 
walk model is a safer model where the time span is large, and it is usually preferred over 
the truly constant model. 

Choosing an appropriate process model is always an important consideration. A 
certain amount of common sense judgement is called for in deciding on a model that will 
fit the situation at hand reasonably well, but at the same time will not be too complicated. 

For instance no process is a random walk to infinity. At some point it is (band) 
limited somehow: The troposphere for a short time looks like random walk. But suppose 
there were no measurements for an entire day? Would our lack of knowledge about the 
troposphere approach infinity? No, of course not. We know that the troposphere zenith 
delay can be predicted to be 2.4 meters with about 2% uncertainty or better. So infinity is 
not the limit. 


Computing the Autocorrelation 


It is simple to compute the autocorrelation for an ordered set of data ao, aj, ..., Qn—1, 
taken at a constant time interval. First we compute the mean m (the average). Second we 
may imagine the data arranged in two rows: 


ag aj an az a4 
a9 aj a aaa 


n—1 
Shift = 0: auto(0) = È aja; /n. 
0 
We multiply elements a;a; above each other, and add. Now shift the lower row: 


n—1 


auto(1) = X ajaj—1/n. 
l 


Ag at a2 ay a4 
Qo aA, a2 az a4 


Shift = 1 : 


Again we multiply elements above each other. This time the number of terms is diminished 
by one. We continue shifting, and each time we divide the sum by the number of data n. 
The M-file has the following core code: 


auto = autocorr(a) 
m = mean(a) 
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Figure 16.4 Autocorrelation for random data. The peak at zero measures the variance. 


for shift = O:n-2 
q = 0; 
for t = 1:n—-shift 
q=q + (a(t)—m) «(a(t + shift)—m); 
end 
auto(shift + 1) = q/n; 
end 


The sums for large shifts contain only a small number of terms; the overlap count n — 
shift — 1 is small. It is statistical practice to omit the last 20% (or a similar fraction) of the 
shifted product sums; they are not so reliable. Luckily enough we are most interested in 
the autocorrelation for small shifts as those reveal the nature of the data. So the result of 
autocorr is most important for small shifts. 

We divided the sums by n, and not n — shift — 1. This happens to secure that 


auto(0) auto(1) auto(n — 1) 
auto(1) auto(Q) auto(n — 2) 
= , , À (16.34) 
iiot — 1) iion — 2) auto(0) 


is positive semi-definite. 
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Figure 16.5 Correlogram for an autocovariance function. The dashed horizontal lines 
represent the limits +2/,/n, n being 80. 


Variance Component Model 


Once again we assume that our observations (signal) are stationary. Furthermore that ob- 
servations are made at equidistant time invervals. In the following we shall introduce some 
useful tools for analyzing autocorrelation functions. 

Often we are furnished with a normalized autocorrelation function. The autocorre- 
lation coefficient r; is defined as 


_ Rx{k) 
Rx (0) 


rk k=0,1,...,2—1. (16.35) 
The plot of rg for varying k is called the correlogram for the random process xg. One sim- 
ple use of the correlogram is to check whether there is evidence of any serial dependence 
in an observed time series. To do this, we use a result due to Bartlett who showed that for 
a white noise sequence ag and for large n, rg is approximately normally distributed with 
mean zero and variance 1/n. Thus values of rg greater than 2/./n in absolute value can 
be regarded as significant at about the 5% level. If a large number of rg are computed it is 
likely that some will exceed this treshold even if ag is a white noise sequence. 

Figure 16.5 shows a correlogram for a receiver clock offsets. The limits +2/,/n are 
exceeded for shift 0 to 12 and a longer sequence from k = 20 tok = 56. So the correlogram 
indicates correlation between observations even with shifts up to 56 units of time. 

In practice random processes often show a non-random trend us. With the noise 
term e; we have 


Zt = Mt + ér. 
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We shall demonstrate how to handle this circumstance by splitting the observation Z; (t) of 
the ith series into a starting level L;(t), a random stationary part M; (t), and noise N; (t): 


Zi (tk) = Li(to) + Mi (tk) + Ni (tk). (16.36) 
All observations for all experimental subjects i = 1,2,...,m are taken at equidistant 
times fo, t},... , tk. Li is defined at time fo while the random variables depend on time tg. 


L; is independent of L; fori # j and is distributed as N (O, 22).` The noise N; (t) is 
distributed as N (0, v?) and independent in time as well as between subjects. Finally M; (t,) 
is distributed as N (0, u?) and independent between subjects and with R(k) = u*r(k). We 
remember that r (0) = 1 andr(k) — 0 fork — ow. The three components L;, Mi, and N; 
are assumed mutually independent. Hence the observational variance is 


R,(O) = Var(Z;) = X? + u? +". 
The covariance between Z; (tj) and Zj (tm) is 
o (Zi (ti), Zj(tm)) = E(Zi(1)Zj(tm)} 
E{(Li (to) + Mi (ti) + Ni(t)) (Lj (to) + Mj (tm) + Nj (tm))} 
E{L;Lj} + E(Mi(t)Mj(tm)} = (Var(Li) + 2? r2(tm — t1))ôij 
= (A? + w?r,(k))5i;. 


Note that especially fori # j the covariance is zero due to the independence of subjects. 
For a moment we dwell on the autocorrelation R,(k). The observational error N; (tg) 
is not equal to zero; this implies that R,(k) does not approach R,(0) for k — 0. Also R,(k) 
does not approach 0 for k — ov. This is caused by the subject specific random variable L;. 
From (16.34) we remember R and let E denote a matrix with all ones, then the 
covariance matrix can be written as 


E =A E + uR +v’. 
Now we are ready to introduce the variogram function 
V (k) = 4E{(Zi(t) — Zi(tm))’}- (16.37) 
We have 
Zi (ti) — Zi (tm) = Mi (t) — Mi (tm) + Ni (t1) — Ni (tm) 
and remembering that M; and N; are independent we get 
V(k) = E{(Mi(t) — Mi(tm))”} + SE{(Ni(t) — Ni(tm))?} = u? (1 — re) + v?. 


Figure 16.6 shows a variogram with the three variance components A”, 47, and v?. 
Remember r,(0) = 1 and hence limz_,9 V (k) = v*. We can read v? from the figure as the 
intercept with the ordinate axis. 
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Figure 16.6 Variogram illustrating the three variance components. 


For any stationary, random process we have rz(k) — 0 for k —> oo. Thus V (œ) = 
u? + v?. So the variance u? within the individual subjects is found as V (00) minus v”. 

The total variance for all observations Z; is estimated by taking data across experi- 
mental subjects. Again under the assumption of mutually independent processes the total 
variance is computed as the mean value of (Zi (t) — Zj (Gy /2 for all Z and m and all i 
and j withi Æ j. This makes M = }; (G ) terms, where n; is the number of observations 


in the ith series: 
2 
Mt pw? + v? = yy Y (ZOD — Zi m) - 
ix] 
The total variance also is illustrated in Figure 16.6. The variance 47 for the population can 
be read at the ordinate axis, too. 


Two common examples of autocorrelation functions are the exponential correlation 
function 


r(k) = e7% (16.38) 
and the Gaussian correlation function 
r(k) =e (16.39) 


In Figure 16.7 we have sketched the autocorrelation function for (16.38) and (16.39) and 
the corresponding variograms. The exponential correlation function decreases strongly for 
small time differences k while the Gaussian correlation function has a strong correlation 
over a larger time. Later it decreases rapidly. 


Example 16.16 For a demonstration of the theory we use one-way differences between 
a receiver and a satellite. We use satellites with pseudo random noise (PRN) codes 2, 9, 
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Figure 16.7 Exponential and Gaussian autocorrelation functions and variograms. 


16, 23, 26, and 27. We concentrate on ionospheric delays for epoch k. We observe on two 
frequencies to cancel the main error in the delay Z: 


(D2, — à2N2) — (Pie — A1N1) 
1 — (fi/fo)? 


Figure 16.8 depicts J% as calculated for one-ways, their differences and their differences 
again which are the so-called double differences. The actual baseline length is 4.6 km. For 
one-ways J; typically varies from 5-15 m. Single differences have J, values of 2.5-3 m 
and double differences between —0.2—0.2 m. Note that the mean values of elevation angle 
for the individual PRN’s are listed in Table 16.3. From this you observe that J, depends 
strongly on the elevation angle for the single PRN. 

Now we turn to autocorrelation functions for Ig. To eliminate a possible trend i in an 
observation series one often starts the investigations from differences in time: Ig — Ik-1. 
The autocorrelation for the differenced ionospheric delay for the one-way to PRN 2 is 
shown in Figure 16.9 (upper left). The spike at zero equals the variance of a difference 
Ik — I,-1. Next we compute the autocorrelation for the undifferenced Iz. The result is 
shown at the upper right. This evidently reflects a remaining systematic term in the delay. 
Our ultimate goal is to model this part and subsequently subtract it from the actual delay, 
hopefully leaving only white or nearly white noise. Knowing a good model we can with a 
high degree of accuracy predict the ionospheric delay in time At and also with distance d. 

We continue computing the autocorrelation for J, for a single difference. When 
computing autocorrelation for differences we must remember the rules given in (16.13). 
This implies that we have to compute cross-correlations. The lower left of Figure 16.8 
shows the result for PRN 2. 
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In Table 16.3 we have oy = 2cm for PRN 2 and oz; = 77cm for PRN 16. The 
numbers in Table 16.3 are produced by calls like 


one_way (m) (27) 


and autocorr(x(2,:)’) 


However these numbers diminish to 4mm and 9 mm for single differences and 2mm and 
9 mm for double differences. The general M-file for this example is called oneway_i. 


Table 16.3 Autocorrelation of ionosphere delay for one-ways 


Elevation 
(in °) 


cg (in m) 
master 


Shift for first zero 


rover master rover 
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Figure 16.9 Autocorrelation for ionospheric delay to PRN 2. Upper left shows differ- 
ence in time: J; — Jx—1, upper right the one-way, lower left the single difference, and lower 
left the double difference. Note the various orders of magnitude. 


Because of varying geometry, J, will vary with different baseline lengths d = 5, 25, 
50, 100, 500, 1000km, say. So we suggest the interested reader to explore this sort of 
investigation with the final goal of determining the variance of differential ionosphere og, 
correlation time T, and correlation length D in an expression for the autocorrelation with 
sample interval At and baseline length d: 


R, (k, d) = age !^/T e74/P. (16.40) 


This function Ry has been determined from double differenced phase observations in Goad 
& Yang (1994). The authors assume stationarity and an exponentially correlated process: 
they estimated T to 64 min., og = 2m’, and D ~ 1500km. In general you may adjust a 
Gauss-Markov process to the parameters o? anda. At k = 0 we fit o°, and at the 1/e-point 
we fit œ. 


It is known that the ionospheric delay has a distinct daily variation. In Klobuchar (1996) 
we find an approximate expression for J in meters as function of local time ¢ in hours: 


I = 2.1 + 0.75 cos((t — 14)27 /28). 
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Figure 16.10 Ionospheric delay model as function of local time and elevation angle. 


This value is valid in the direction of zenith. To get the increased value of J in a direction 
with elevation angle Æl in half circles (the range of El is 0-0.5, half circle times a equals 
radians) we have to multiply by the obliquity factor 


F(ED = 1 + 16(0.53 — El)?. 


This expression is based on a formula given in ICD-GPS-200 (1991). Figure 16.10 shows 
the graphs of J and F (El). The small constant level of delay equal to 2.1 m represents the 
delay at night. As the sun rises and sets, the ionospheric model gives rise to the cosine- 
shaped pulse for daytime. 


When interpreting an autocorrelation function it is useful to remember the following im- 
portant properties: 


Maximum value The autocorrelation function has a maximum at zero shift: 


E{x°(t)} = Rx) > [Rx (1. 
Symmetry conditions The autocorrelation function is an even function 
Rx (k) = Rx(-k) 
while the cross-correlation satisfies 
Ryy(k) = Ryx(—k). 
Mean-square value The cross-correlation is bounded by 


[Rey GOI? < Re(OVRy(0) < 4( (R0) + (Ry). 


Periodic component The autocorrelation has a periodic component if x has one. 
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Remark 16.1 (Filter theoretical interpretation of the technique of differencing) We may 
look at double differenced data D as made up of two single difference measurements S;: 
D = Sı— S2. Each of these is made up simplistically as $; = o;+t1 where p; is pseudorange 
and ż is receiver clock offset converted to length by multiplication by c. Note that t is same 
in both. So double differencing does remove this clock error. 

We illustrate a few basic facts about differencing. Start with two single differences: 


S;=pit+t 
Sy = p2 +t 


and their double difference: 
S2 — Si = 02 — P1. 


For the double difference the situation has changed. We have one less equation and cannot 
estimate t. You could estimate from single differences but the a priori clock variance os, ap 
must be infinite; otherwise you introduce a third measurement, which you do not have in 
the double difference case. So one case has $1, $2, and Cop: This is equivalent to $1, D, 
and Oana If o, ck = œ it really means it is of no value. So this is really equivalent to Sj 
and D. Now since the double differences do not involve the clock explicitly, the only use 
to be made of $4 is to estimate the clock. Since a clock estimate is not needed by double 
differencing, D stands by itself in estimating position. 

However if Gass Æ oo then this couples all three measurements in which case all 
three should be processed together. It does not matter whether S1, S2, £ or $1, D, t or So, 
D, t is used. If you treat clocks as having infinite variance at each epoch and then plot their 
estimates, you will see that clocks do not drift with infinite variance. So theoretically it is 
correct that we do know something about clock drift, and using double differences alone 
does not allow us to take advantage of this knowledge (because ¢ cancels). 

Then the difficult question is “how to model (receiver) clock drift randomly?” Clearly 
different receivers have vastly differing (usually quartz) clocks. The best opportunity 
would be for the International GPS Service for Geodynamics (IGS) to drive their receivers 
used for orbit calculations with rubidium or, even better, cesium oscillators. Then a more 
descriptive random model with smaller variances could prove useful. Quartz clocks drift 
so wildly relative to centimeter positioning requirements, that trying to model them would 
add little to improving positions. So we may conclude: 


_ Double differences can be conceived as a filter where the clock behavior is modeled 
as white noise with large variances. 


- Using single differences with analytical prediction of receiver clock offset is much 
better than using a filter. 


Aspects of Random Processes 


We have presented three basic models: random walk, random ramp, and an exponentially 
correlated process. Each one introduces a special pattern for how random errors interact 
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with the model. If the model is chosen correctly the residuals should be white noise. A 
good measure for the randomness of errors ts to look at their autocorrelation. Or in 
other words: A given autocorrelation function is based on a given model. If the model fits 
the data, the error autocorrelation function ideally is white noise. 

Pure observational errors will show a very clear spike at zero. Systematic errors will 
give an autocorrelation that remains large, even far away from zero. To demonstrate the 
autocorrelation for random data we call the M-file model_g(randn(1:300)). The result 
was shown in Figure 16.4. Here we have an autocorrelation function which has a spike 
at zero. This spike is a measure for the variance of observation. The global features 
of the autocorrelation function (away from shift = 0) originate from the model. If the 
autocorrelation drops off at a certain distance from zero we speak of a band limited process. 

Sometimes we are lucky to know the physics behind a problem. Then it is often 
given in terms of a differential equation. 


Example 16.17 We describe a procedure starting from a differential equation, leading to 
the discrete state equation. The z-transform exhibits poles of the solution and these poles 
determine whether the filter is stable or not. 

The actual differential equation describes a forced motion for a body with mass m, 
damping constant d, and spring modulus c, all under the influence of an external force f: 


mx+dx+cx = fÍ. (16.41) 


The actual displacement of the body is x(t) = f x(t) dt and we differentiate to get x, ~ 
Xe—1 + Xp—1 with At = 1. Furthermore 


x(t) = Jš dt and Xk © Xk—1 + Xk-1. (16.42) 
The discrete form of equation (16.41) is 
5 d, c 
hea) = =Sam — ea Í (16.43) 
m m m 
Insertion of (16.43) into (16.42) yields 
. : d. C f 
Xk = Xk-1 + | ——Xk-1 — —X*k-1 + — J. 
m m m 
Hence in matrix form the state equation becomes 
Xk] l 1 5 1 0 
Xe | |—c/m 1—d/m || xp-1 fim] 


Random walk is the special case with c = 0 and d = 0. There is no spring force and no 


damping: 
Xe} |1 Ly) xe-1 0 
bg E i i Fa j Pa: (16.44) 
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We substitute the state variables by Xi k = xx, X24 = Xk, F(z) = f/m: 


X 1k jl 1 Xik—l 0 
ky 16.45 
bal m bel PME PA l ) 
The displacement of the body from static equilibrium is X;,,, and X2 x is the instantaneous 
velocity of the body. 


The solution of this discrete system is found via z-transforms X1(z) = >> X1 z“ 
and X2(z) = > Xap n. The transform of (16.45) is 


zX1ı(z) = X1 (2) + X22) 


zX2(z) = - X16) + ( í ) x2) FPF) 


ke 
m 


We rewrite the last equation as 


xa(e)(z = (1 S *)) ze NOE). 
m m 


Then insertion of X2(z) into the first equation gives 


—(c/m)X1(z) + F(Z) 
— (1 —d/m) 


p d d C 
(: — (2- S |e ( — — + <) xv = F(z). 
m m m 


The position X; (z) of the body is given by 


zXı(z) = Xı (z) + 


or 


Xi) 1 
F(z) z2— (2—d/m)z+(1—d/m+c/m) 


The random walk case with c = 0 and d = 0 gives 


Xiz) l ee 1 
Fa 2 =3741 zelz-l 


The function X;(z) has a multiple pole at z = 1. In general if all poles are within the unit 
circle we deal with a stable filter. If one or more poles are outside the unit circle we have 
an unstable filter. The present case with z = 1 is most delicate. The neighborhood of z = 1 
is a true “minefield”. 

The z-transform of the impulse response is the transfer function H(z). Input and 
output are connected by H (z) and it is the key to an optimal control. This leads further to 
explicit expressions for autocorrelation and power spectral density. However, this relatively 
simple example is at the limit of complexity for finding closed form solutions to algebraic 
equations by purely algebraic means. So we do not want to present more theory, because 
in most real world examples numbers rather than formulas have to describe reality. 
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Example 16.18 We want to combine a random walk and white noise in the dynamic linear 
model 


System equation: Xk = Xk—-1 + €k, ek ~ Ni (0, o2) 
Observation equation: bk = xk + ek, er ~ Ni (0, a”). 
This is the simplest possible model which has a surprisingly wide range of applications. 
Example 16.19 Extended filter for the phase observation x = g: 
System equation: Xk = Xp-1 + €k, ek ~ N1 (0, a?) 


Observation equation: bg = constant + sin( fı At + xk) + ex, ex ~ Ni (0, o? ' 


Our first development of this example is in the M-file mangouli. 


We summarize this chapter by repeating the essential points for a Gauss-Markov process 


Autocorrelation R, (k) = o?e™®kl 
Process x, = e™%lAtl ót] + ek 


Variance ea =E (e?) = g? (1 — e7?2lAt]) to be used on the diagonal of the filter covari- 
ance matrix e,, see next chapter! 
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KALMAN FILTERS 


17.1 Updating Least Squares 


The key to the Kalman filter is the idea of updating. New observations come in, and they 
change our best least squares estimate of the parameters x. We want to compute that 
change—to update the estimate x. The process will be efficient if we can express the new 
estimate as a linear combination of the old estimate Xo\q and the new observation bnew: 


Knew = L£old og K bnew- (17.1) 


This section will make four key points as it introduces the Kalman filter. Then the following 
sections explain and illustrate these points in detail. 

The first point is that the process is recursive. We do not store the old observa- 
tions boa! Those measurements were already used in the estimate Xo1q. Built into equa- 
tion (17.1) is the expectation (or hope) that all the information from boją that we need for 
Xnew is available in £oa. Since x is a much shorter vector than b (which is growing in 
length with each new measurement) the filter is efficient. 

The second point is that the update formula (17.1) can be written (and derived) in 
many equivalent ways. This makes a lot of expositions of the Kalman filter difficult to fol- 
low. The frustrated reader finally just asks for the damn formula. But some of the variations 
help the intuition, for example by separating the “prediction” from the “correction”: 


Xnew = Xold ae K (bnew z AnewXold)- (17.2) 


Direct comparison with (17.1) gives the relation L = I — K Anew. The Kalman gain 
matrix K becomes the crucial matrix to identify. It multiplies the mismatch bnew — AnewXold 
(the innovation) between old estimates and new measurements. From the estimate Xo1qg we 
would have predicted measurements AnewXold. The difference between this prediction and 
the actual bnew is multiplied by K to give the correction in £. That is (17.2). 

This gain matrix K involves the statistics of bnew and foja, because they tell how 
much weight to give to this mismatch. (Thus the crucial covariance matrix P must also be 
updated! This will be our third point.) Variations of the updating formula are introduced 
for the sake of numerical stability, when this formula is to be applied many times. All the 
equivalent forms must be related by matrix identities and we will try to make those clear. 
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The third key point is that the reliability of £ is a crucial part of the output. This is 
the error covariance matrix P = Xz for the estimate. P tells us the statistical properties 
of x based on the statistical properties of b. This matrix does not depend on the particular 
measurements b or measurement errors e. Those are only random samples from a whole 
population of possible measurements and errors. We assume a Gaussian (normal) distri- 
bution of this error population, with a known covariance matrix Łe. (Well, barely known. 
We may struggle to find a realistic £e.) The least-squares solution £ to a static equation 
Ax = bis weighted by X7 l and P = (AT yA) is the error covariance matrix for £. 
It is this covariance that we update when measurements bnew arrive with Le new: 


Pew = (I — K A) Poal — KA)" + K Ee new K". (17.3) 


To obtain this formula from (17.1) with L = I — KA we assume that the errors €new in 
bnew are statistically independent from the errors eoid: 


The covariance matrix for b = fa is E, = es : | . 
bnew 0 Ze new 


Note We will go through these steps again in detail. The goal of this first discussion 
is to see how the pieces of the Kalman filter come together. The reader will not forget 
that the inverse covariance matrix L> l enters as a weight in the least-squares problem. 
Thus {oa involved the covariance Ee oig for bog. Similarly Xpew will also involve the 
covariance Le new for bnew. Again we hope and expect that it will not be necessary to store 
Èe old, since it has already been used to compute Xo1q and Pod. 


We come to the fourth point, which is fundamentally important. The least-squares 
problems described up to now have been static. We have been updating our estimate x of 
the same parameter vector x. It is recursive least squares and we could have done it earlier 
in the book, without mentioning Kalman. With exact measurements, x = Xold = Xnew 
would solve both the old and new overdetermined systems in this static situation: 


AoidX = boig and Fool [x]= bw | l (17.4) 


Anew b new 


We have been adding new rows to the system but not new columns. If the best estimate Xo1q 
happened to be exactly consistent with the new measurements, so that AnewXold = bnew, 
then the measurements bnew would not change that old estimate. This is clear from (17.2): 
Xnew would equal foa because the gain matrix K will be multiplying the zero vector to 
give a zero correction. 

The Kalman filter operates on dynamic problems. The new state vector x; (for exam- 
ple, the position of a GPS receiver) is not generally the same as the state vector x,_1. The 
state is changing, the receiver is moving, and we assume a linear equation for that change 
in state—with its own error ég (some texts call it €,—1): 


Xp = Fk—-1Xk—1 + €. (17.5) 
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This is a discrete-time linear random process. The index k indicates the new state x; and 
the newly measured quantities b and the errors ex in those new measurements: 


by, = AkXk + ek. (17.6) 


Thus ex still shows the mismatch in the prediction, but the prediction is using (of course) 
the state equation (17.5) for xg. And that state equation includes an error (or noise) éz. 
Since (17.5) is a one-step equation (a first-order process), involving only x,_; and 
not the earlier states xo, ..., X,—2, we look for an estimate of xz that does not involve the 
very old estimates X09, ..., X,-2. The update comes in two stages, a prediction using the 
state equation (17.5), and a correction using the observation (17.6): 
Prediction: Lip = Axes 


, À a R (17.7) 
Correction: Xkjk = Xkje—1 + Kk(bk — AkXkik-1). 


We also want a two-stage update of the error covariance matrix P (or its inverse). That 
will be an extremely important equation. We present the prediction Pķjkķ—1, from which 
(17.3) gives the correction Paew = Px |x. The prediction comes (of course) from the state 
equation: 


Pkjk-1 = Fk-1 Pk-1 Fli + Eek- (17.8) 


The combined step to Pig uses the new covariances Li, , and Xe, x that describe the error 
populations from which ex and €; are drawn. 


Important We assume that ez is statistically independent of all other €; and all e;. The 
whole covariance matrix & is block-diagonal, with blocks for every ég and ez. One block 
describes errors in the state equation, the other describes errors in measurement. 


The Complete System Ax = 6 — e 


It may be useful to the reader if we write down the complete system of equations through 
time fz. The initial value x9 = Xp is given, with its own error covariance matrix Pojo. 
Then each time step introduces two new block rows in the big matrix Æ and one new block 
column—corresponding to the new x; in the list x, of unknown state vectors: 


Ao bo eo 
— Fo I Xo 0 €] 
A; x] bı e] 

=i d >: {=| 0]-ļ]e]. (17.9) 
oe , : 
—F,_; I Xk 0 €k 
Ak b; ex 


Looking at those last two rows, you see the state equation (17.5) and the observation 
equation (17.6). It is this complete system that the Kalman filter solves recursively by 
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weighted least squares. We will write equation (17.9) as Ax = & — e or more precisely 
Apr, = bg — ez. The script letters A, x, b, e indicate the complete system up to fx. 

The weights are the covariance matrices of the errors eg and eg. The output is the set 
of state estimates Xp, ..., Xg and their covariance matrices Po, ..., PŁ. Those could come 
from the usual formulas like P = (ATE! A)~!, but they don’t. The recursive formulas 
determine £, and Py from by and X,~1 and Py_1 and Èe k and Eek. 

The possibility of a simple update is built into the structure of that complete matrix A 
(which is block bidiagonal). The block diagonal weight matrix C = E`! contains the 
inverses of all the blocks £, and £e. The large matrix ATT! A is block tridiagonal— 
and tridiagonal matrices form the classical situation for recursive formulas. Without a 
tridiagonal matrix (if the new state equation for x; involved states earlier than xg—1), a 
one-step update would be impossible. 

The variety of Kalman filter formulas comes from the variety of ways that we can 
represent the matrix ATCA. This block tridiagonal matrix is symmetric positive definite. 
It has a Cholesky factorization LLT into block bidiagonal matrices. We can update L for 
a more stable “square root algorithm.” Or we can update the Gram-Schmidt factorization 
QR that orthogonalizes the columns of A. 

All those updates can be derived by manipulating the complete matrix A. It is impor- 
tant to recognize that (17.9) is the system we are solving (by weighted least squares). But 
the derivation is much easier if we assume an expression involving only the most recent 
times k — 1 and k, and determine the matrices (especially Kalman’s gain matrix Kx) in that 
update formula. Unlike the static case, the gain matrix Kg will now depend on the discrete 
time index k. 


Example 17.1 (Recursive update for a static one-parameter problem) A doctor takes m 
independent and equally reliable readings b1, ..., bg of your pulse rate x. Each measure- 
ment error (x — b j)? has expected value o*. We want to compute the best estimate x, and 
its variance Py = E{(x — X%,)*}. First we compute x; and P; directly, then we compute 
recursively from ¥g—1ı and Pr_. 
The & measurement equations are simply 
1 bj 
x=bi, x=b, ..., x=h or ba 
1 b; 
Thus A (we reserve Æ for the dynamic case including a state equation) is a column of ones. 


The covariance matrix is Xp = ø? I, diagonal because the readings were independent. The 
1 by 1 matrix PT! = ATEJ 'A and the right side ATCb are 


1 bı 
_ Ty. I]. by +: +d, 


1 bk 
The best unbiased estimate, as everyone expects, is the average of the readings: 


2 
A by Bae 
<, = P(A'Cb) = s(t), (17.10) 


o2 
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The 1 by 1 matrix E{(x — %,)7} is P = (ATCA)! = o%/k. By taking k independent 
measurements with variance o7, the variance of the average value x; is reduced to o?/ k. 
We can show that without matrices and without Kalman! 

But now do those computations recursively. The step begins with the correct values 
at time k — 1: 


A 2 
fod = pH (bi + bk) and Poasi (17.11) 


The new observation bx brings a correction = prediction error = innovation, and remem- 
ber that Anew 1s just [ 1 ]: 


Xp = ĉk—-1 + Ky (by — £k-1). 
The extra measurement reduces P and increases P~!: 
zi E | 1 k—1 1 k 
P, = Pita = or oS (17.12) 


We will explain that i eee L/o?. It is computed as Ad go Anew = [1] 1/o? I[ 1]. 
By working with P~! instead of P, the update (17.12) is very simple. From P, we obtain 
the gain matrix K, and the correction to x: 


Ke = Palar = [Rh] L] (17.13) 


koi (ut -+bp_ L) + tbr. (17.14) 


Re = Spy + + (bk — Fe-1) = r] 


The key point is that final combination of the old average and the new measurement. You 
see how they combine to give the new average (bj +---+b,)/k. We did not have to add all 
the old and new b; to compute their average x;,! The first k — 1 measurements were already 
averaged in £z—1, and the correction gave the new average with only one more addition. 


Example 17.2 (Same static problem but change to Xpew = of for the kth measurement) 
We can show ESY that F update is still correct when the variance for the kth observa- 
tion panes from a” to Ge. The matrix A is the same column of ones. The last entry of £ 
is now of and a direct calculation gives P SA A 


] 
PS T] SopS ieee, (17.15) 


1 
2 
Of 


This shows the update from pee no problem. Then ĉ% can come from (ATC A)~!A'Cb, 
with C = D7: 
b b b 
Directly: %; = P,(ATE=~!b) = P(t TE a i 5) (17.16) 
o oj 
Compare with the recursive form. The kth gain Kx is PATE; l Py ore Then the 
updated £x is just 


Ry = ĉk—1 + Pe(be — êk-1)/0k. (17.17) 
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To verify the equivalence to (17.15), introduce Px p = | into (17.15) and rewrite using 
(17.17): 


P “ta bk — Xk-1 k—1lbi+---+bk-1 Xe bk £k- 


OK o? e= OF o o 
The key point is that £ķ—;ı already captured the needed information from b1, ..., bg-1. 


Those measurements are no longer wanted! And we still owe the reader an explanation of 
the gain matrix Kx. 


17.2 Static and Dynamic Updates 


The first measurement is bb = Aoxo + eo. We estimate xo using this set of observations bo. 
The estimate is Xo. The equation Apxg = bọ is probably overdetermined. The least- 
squares solution—weighted according to the inverse C = Do0 of the covariance matrix 
for the residuals of bọ—is the usual one: 


A z —|] £ 
ĉo = (ASX; 440) Ab Ez obo. (17.18) 


The error xọ — £o has expectation zero and its covariance matrix (which is a minimum for 
this choice of weight) is 


Eo = E{ (xo — o)(x9 — ¥o)™} = (AP Ez} Ao). (17.19) 


Now we ask: If more data are available, is it possible to estimate x for the total system 
Aox = bo, Aix = bı without starting the computation from the beginning with bo? 

It is easy to imagine this situation. We want to estimate a static position—latitude, 
longitude and height—by means of a GPS receiver. At an arbitrary moment we estimate 
the best position. The next observation is likely to deviate a little from the earlier estimate 
and we should generate a new estimate based on the increased data set. If the error of 
the new estimate is independent of earlier errors there must be a way to work recursively. 
We want to find x; based on the earlier estimate Xo and the new observation bı. (Point of 
notation: x; depends on all measurements bọ, ..., by. It does not come only from bz.) 

The final result should be identical to the one we get by calculating x; from the 
beginning. The right choice for the weights is C = “> l where 


> , : 
pe = as ? is the covariance matrix of the residuals K l 
0 Yet el 


The matrix &, is block diagonal because e; is independent of ep. So the coefficient matrix 
A'>>'A in the equation for 2; is 


—1 

R Eeo 0 A 3 5 

yaaa ° | = ATE lAo + ATEZ !A,. — (17.20) 
O Let Ay l i 
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Remember that x; is best for the combined system ApxX; = bo, AjX; = bı. Now the 
normal equations are ATE, l A$ = ATE, 1h. We write P or P; or Phew for the inverse 
matrix (A'X>!A)~!. Then the optimal solution is 


bo 


= Pil Ag Aq |B." fs 


| = P; (Ap Ez ebo + A] £, jb). (17.21) 
It is this solution that we hope to find recursively by using the already computed Xo instead 
of bo. The difficulty is that the bp term is now multiplied by Pı instead of Po. So we update 
the coefficient matrix in the normal equations by means of (17.20): 


Pol = Py) +AfTXZ{ At. (17.22) 


We realize that the covariance matrix Pı = Phew for the updated estimate is “smaller” than 
Po = Poa. The addition of new information should reduce the covariance matrix. The 
last term describes the increase in information originating from the latest observations. 
Note again that (17.22) does not depend on the individual observations bo and bı. It only 
exploits their statistical characteristics to determine the characteristics of £4. 

Of course, the estimate £; is based upon the actual observations bo and bı. It is 
calculated according to (17.21) and the entire recursive least-squares procedure is about 
rewriting this expression: 


£, = Pi(Po fo + A] Ezib) = Pi(P, Fo — APE, | Aiko + A] Z, fr) 
= Xo + Kı (bı — A120). (17.23) 
This formula identifies the gain matrix that multiplies the innovation: 
Gain Matrix: Kı = P1A| XZ}. (17.24) 


By this manipulation the formula for xX; has become recursive; it uses Xp instead of bo. 
This is the result we want and it is easy to explain. 

If the new observations are consistent with the original Xo then bj = A ,Xo. In this 
case the mismatch is zero. Consequently there is no reason for changing our estimate of £. 
The best choice is 4 = £o when the new observations by still lead to the old positions Xo. 
In general this is not the case. There will be an error of prediction b} — A;Xq which is 
called the innovation. That is the unexpected part of bı and (17.23) multiplies by the gain 
matrix Kı in order to give the correction. Now the new Pı and x; contain all we know. We 
are ready to receive another new observation b2. 

When bz arrives we look again at (17.22) and (17.23). The algebra that provided 
those formulas can be used in the new situation. Indices 0O and 1 are changed to 1 and 2 
(and later to k — 1 and k). We obtain the fundamental theorem for “static” recursive least- 
squares (expressed for P~! where the “dynamic” Kalman filter updates P): 

The least-squares estimate X; and its variance P; are determined recursively by 

Ro? SP eA, Ar 
(17.25) 
Êk = Fp—1 + Kg(bk — Akêk-1) with Ky = PAg E, 
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Dynamic Problems and State Equation 


Now comes the big step. The state of our system is modeled by the state vector x(t). This 
vector changes with time. Part of that change is described by a known differential equation 
or difference equation (taken to be linear). This state equation—we will work in discrete 
time with a one-step difference equation—also allows for an error éx at each time step: 


xg = Fg-1Xk—1 + €k. l (17.26) 


The known matrix F;,—; is the state transition matrix. The unknown vector x; is the state 
at time k, and its estimate will be x;. In addition to information about x; from the state 
equation, we make direct measurements b at time k: 


by = AkXk + ex. (17.27) 


It is this system of equations (17.26) and (17.27) att = 0, 1,..., k that the Kalman filter 
solves by weighted least squares. 

The weights are as always the inverses of the covariance matrices. We assume that 
the errors €g and ez are independent and Gaussian, with mean zero and known covariance 
matrices Ue, and Le x: 


E{ege}}= Eek and Efepet} = Dex. (17.28) 


Now Kalman’s problem is completely stated. One way to solve it would be to construct 
long column vectors b and x and e (different lengths!) and a big rectangular matrix A: 


Ao 
b 

n Xo g O 1 

€] A 

b; x] 1 

b= 0 > x= $ e = : A = — Fy I 
er 
Xp : i 
Xk ; — Fg I 

b; ek 


Ak 


Our model is exactly Ax = & — e. The optimal estimate x solves the normal equations 
As! At = AX! 4b. The big covariance matrix & is block diagonal, because we as- 
sume that the error vectors €o, €1,..., €k—1, €k, €k are independent of each other (but might 
be correlated within those vectors). The blocks of X are Leo, Le 1,---, Le k—1, Ye,k» Ue,k- 
Then the solution is a long vector x containing our best estimate of the whole history: 


XO\k 
Xk = , i 
Xk—1|k 
Xk|k 


The second subscript indicates that we have used all information up to and including time k. 
This is a point to emphasize: early state estimates are eventually affected by the later 
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measurements. It is useful to think separately of those corrected (often called smoothed) 
values for the earlier states xo, ..., X;—1. In using the Kalman filter we may or may not go 
back in time to compute them. The one thing we are sure to compute is the estimate Xx |x 
for the current state: 


Smoothed values (of earlier states j < k):  Xyyjx 
Filtered value (of the current state): Xk} k 


Predicted value (of the next state): Xk ik = FuXk ik. 


The prediction comes directly from the state equation (17.26). It does not account for 
the next measurement b,,;. The Kalman filter will operate recursively, starting with the 
prediction and correcting it (based on the mismatch betwen prediction and observation). It 
will do this for the next state k + 1, and it did this for the current state k: 


Prediction: Xkik—1 = Fp_1Xp-] k—1 (17.29) 


Correction: Ekik = Êkjik-1 + Ke (Ok — AkXk\R-1).- (17.30) 


The correction is needed because bg contains new information. The gain K; involves the 
covariance matrices, which tell the reliability of the inputs and how much weight to give 
to them. The covariance matrices Xe, and Xe k for the observation and state equations 
contribute to the covariance matrix Pki for the error in kjk. And this Px |x is also found 
by prediction and correction, rather than computing (AX~!A)~! from big matrices: 


Predicted Covariance: Pkik-1 = Ff Peet + Lek 
Gain Matrix: Kg = Pkjk-1A} (Ak Pkik—-1 A; T Zee) 
Corrected Covariance: Pig = (I — Kk Ag) Pkik-1. 


Notice again that these covariance calculations do not depend on the actual measure- 
ments by. We often compute the reliability Pkję of Šk] ķ in advance. (And if we are not 
satisfied, we might never compute £x from the observations.) In on-the-fly filtering the 
recursions for Pki and {gjg go forward in parallel. We store earlier values only if we plan 
to return and smooth them. 

It remains to verify the correctness of these update equations, and to express them 
in different forms. The symmetry of Px), is not so clear from the expression above, and 
the gain matrix K;, can take a simpler form. First we comment on the notation. Then the 
reader can quickly match our (standard) symbols with other (also standard) symbols in the 
literature. 


Notation 


Everybody agrees that x is the state and K is the gain matrix. But there are several alterna- 
tives for the covariance matrices. Up to now we have consistently used £, indicating by Xp 
or Ł+ which covariance matrix we mean. With more subscripts needed in this chapter, we 
introduce the letter P to replace X;. Thus the covariance matrices are Xe g (observation), 
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Table 17.1 Commonly used notations for a discrete Kalman filter 


This book Gelb etal. Grewal et al. 


Matrix in observation equations 
State transition matrix 

Vector of observations at time k 
State vector at time k 

Error in observation bg 

System error in state equation 
Kalman gain matrix 
Covariance matrix of error in bg 
Covariance matrix of system 


Predicted covariance for ¥kjk—1 


Corrected covariance for Xxx 


Predicted estimate of x; 


Corrected estimate of x; 


Ee k (state equation), Px, (estimate Xx), of xg). Other authors use Qg and Rx for those 
first two covariance matrices. We found it simplest to use ég (not €,—;) for the error in the 
state equation for xz. 

Now for the subscripts. They must indicate the time k to which the estimate applies. 
They must also indicate the time when the estimate is made. This is often k — 1 (for the 
prediction) and k (for the correction). We could use the words old and new, or the symbols 
(—) and (+), or the subscripts k — 1 and k. We chose the subscripts because they are the 
most explicit: 


Predictions: Pk(—) = Pk oa = Pkik—1 and Xx(—-) = Xkold = Xkjk—1 
Corrections: PCH) = Pronew = Prk and LECCE) = Xe new = Fee 


Table 17.1 shows our notation along with the notations in two of our favorite books. There 
are more notations in use but you would not want to know them. 


17.3 The Steady Model 


We can compare the Kalman filter with ordinary least squares on a problem where they 
might be expected to give the same answer—but they don’t. It is known as a steady model, 
since all observations b1, ..., bm measure the same scalar quantity x. Furthermore all 
errors are independent with variance o? = 1. The two problems look identical: 
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1 The least-squares solution of the n equations x = b, is the average 
F bi +b2 +: +b : ; 
is ld AA with variance P= L, 
n 


This was checked earlier by the formulas of recursive least squares. 


2 The Kalman filter solution comes from the observation equations x, = bg together 
with the steady model x, = xķ—1. Thus every F = 1 in this state equation. 


Nevertheless there is a difference. It comes from the presence of errors eg in the state 
equation. We have xz, = xķ—1 + €k as well as x, = by — ex. Thus we are at the same time 
assuming that x does not change, and allowing for the possibility that it does. When it drifts 
away from xo, the latest x,_; is the predicted value x,),-1 (since F = 1). Then after we 
measure bz, the corrected estimate £j is a combination of prediction and measurement. 
As in least squares, this must use all the measurements—but the recent bg are weighted 
more heavily. In ordinary least squares, which does not allow for drift and just computes 
the average, the bg are weighted equally. 

The difference can be seen after two measurements and again after three. The equa- 
tions that combine xg = bg with xp = xķ—1 are 


bo 
xo 0 —] 1 X0 0 

—| 1 | 0 and 1 xi i= | bil. (17.31) 
1| E% b; -1 1|} x 0 
b2 


If we force x, = xg—1 to hold exactly, we are back to ordinary least squares with one 
unknown. But if we allow drift errors €g, also of variance one, then it is the equations 


in (17.31) that are solved by least squares. The unknowns are the states xo, ..., xx. The 
matrix will be called A instead of A. The normal equations A' Ax = ATb are 
A 2 —l1 0 Š0]2 bo 
2 —] A 
E 4 Peal = and ~1 3 -1] | êm | =] & |. (47.32) 
an l 0 -1 2] | £2) by 


You see how the lower corner is changed (a diagonal entry changes from 2 to 3) as the new 
row and column are added. This pattern would continue: A! A has 3’s along its diagonal, 
except for 2’s at the top and bottom. The factors in LDL! or in OR change only at the 
bottom—which is the reason the Kalman filter works! 

We can see what happens without complicated formulas. The inverses in (17.32) are 


2 1 


5:2. 4 
4 and (ATA) 1 =3]/2 4 2]. (17.33) 
i 2 5 


(A'A) = 3 


After the measurements bo and bı, the best estimates come from the first equation in 
(17.32). Multiplying by (A! A)~! gives, not the ordinary average, but 


7 2bo + bı bo + 2b, 


xoll = and Žil = 
| 3 | 3 
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The new data b; has a heavier weight 2/3 in the filtered estimate x1); of xı. And b; also 
appears in the smoothed estimate xo); of xo. There it is weighted less heavily, but still we 
know more about xo after measuring b,. 

With b2 included, the smoothed and filtered estimates change to 


- 5bho + 2b; + b2 i 2bo + 4b; + 2b2 bo + 2b; + 5b2 
A ae a ea ea a 


The last is the most important. It is the best estimate of x2, using bo and bı but emphasiz- 
ing b2. The possibility of drift produces an exponential decay of the weight attached to old 
measurements. 

The other half of the Kalman filter computes P; (here it is a scalar), which appears 
as the last entry of (A'A)~!: Pi = 2/3 and P2 = 5/8. From bg alone we would have 
had Po = 1. Each Px gives the reliability of the estimate x,),. Thus the estimation errors 
are steadily decreasing. The reciprocals 1, 3/2, 8/5 are the information matrices and they 
increase with every new measurement of the model. 


Xx2\2 = 


Asymptotics of the Steady Model 


The 2 by 2 and 3 by 3 examples above bring out the difference between the static problem 
with fixed x and the dynamic problem with evolving x. Out of curiosity, we find the 
limiting behavior of the n by n dynamic problem as n becomes large. We will keep £e = 1 
and also Ee = 1. It would be interesting to allow different variances o? and oĉ in the 
observation and state equations, and recompute the asymptotic limits. This model will 
give numerical examples of the Kalman filter in Section 17.6, where x is the offset in the 
receiver clock. The steady model is quite relevant to GPS (and the figures will show rapid 
convergence to the limiting value). 

The crucial matrix is V = ATA. It is tridiagonal with entries —1, 3, —1 in every 
row (except Vi; = Vnan = 2). This is because every column of A has 3 entries except 
the first and last columns; all entries are 1 or —1. Since ATA is positive definite, it has a 
Cholesky factorization into LLT. Since A! A is tridiagonal, the triangular factors L and LT 
are bidiagonal. 

Consider the first matrix T that has the steady —1, 3, —1 pattern (including the ends 
Ti; = 3 and Tan = 3). We may expect the entries of its triangular factors to approach 
limits a on the main diagonal and b on the off-diagonal. It is those limits a and b that we 
now compute, by equating to the entries of T: 


a? +b? =3 onthe diagonal and ab = —1 off the diagonal. 


Substituting b? = 1/a? into the first equation leads to af — 3a? + 1 = 0. This quadratic 
equation in a” yields the values of the limits a and b: 


3 5 — 
a? = P and b? = : a 


The diagonal a dominates the off-diagonal b since T is positive definite. We take the 
positive square root for a and the negative square root for b. Now we need the inverse 
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matrix 7~!. In particular we want its last entry 7,,,', in the lower right corner. For upper 
triangular times lower triangular, pa comes from the last entries (asymptotically 1/a): 


1 
(T m~ in b’ (17.34) 


Now we must account for the changes from 3 to 2 in the corners, when T becomes V. 
The change in the (1, 1) entry has exponentially small effect on the (n, n) entry of V7! = 
(A! A)~!. But the change from Tan = 3 to Van = 2 is significant. If e"=[0 ... 0 1] 
then this is a rank-one change by ee’. The Sherman-Morrison-Woodbury-Schur formula 
in Section 17.4 gives the rank-one change in the inverse: 


T'=(V+ee')'=v-!- a (17.35) 
Take the (n, n) entries and denote (V~!)n, =e! V~!e by P: 
3—/5 ae p ON: a 
2 1+P 1+P 

This yields P = La ~ 0.618 which is confirmed by MATLAB experiment. It is 


the weight assigned to the latest measurement b, in the formula for x, |, in the limit as 
n — oo. This is the “golden mean” for the steady model. 


Fibonacci Numbers 


We want to offer more about this steady model, which is the simplest of all Kalman filters. 
Each matrix Fx in the state equation and each A, in the observation equation and each 
variance Le, and Xex is the single number 1. Part of our motivation is the innocent 
pleasure of meeting Fibonacci numbers. They enter the explicit formulas for x and its 
covariance P at every step. You will appreciate the contrast with the matrix manipulations 
in the next section, where the Kalman formulas are derived. 

Of course those general update formulas, when applied to this steady model, will 
yield the Fibonacci numbers. (F4 and Fs and Fẹ appeared above, in the fractions 3/5 
and 5/8.) But here we can provide complete detail. These discoveries were made jointly 
with Steven L. Lee. We will compute the A = QR factorization (Gram-Schmidt) and also 
the ATA = LDL! factorization (Cholesky or symmetric Gauss). 

Recall that Fibonacci’s numbers 0, 1, 1, 2, 3,5, 8,... arise from Fy = Fy—1 + Fg-2. 
They start with Fo = 0 and Fı = 1. Our first step is now to identify the determinants of 
the —1, 3, —1 tridiagonal matrices Tn: 


a 3 -1 
Le, =|) 4% T3=]{-1 3 -1], 
-1 3 


Those have determinants 3, 8, and 21. The n by n matrix T, has determinant Fry42. The 
natural induction proof is to use the cofactors of the first row to find the recursion formula 


(det Tp) = 3(det To—1) — (det Tp—2). (17.36) 
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The Fibonacci numbers F,,42 satisfy the same recursion, because F2y42 = Fon41+ Fon = 
(Fon + Fon—1) + Fon = 2Fon + (Fon — Fon-2) = 3F2an — Fon—2. Thus Fon+2 is the 
determinant. 

The matrices V = ATA are slightly different from T, because A has only two nonze- 
ros in its first and last columns. (All nonzero entries are 1 or —1 from the state and obser- 
vation equations. The 1 by 1 matrix has only a single 1 from the first observation.) Thus 
V differs from T by subtracting 1 from its first entry and also from its last entry. Here are 
the matrices V = ATA: 


Te 2 -1 
Vy, =[1], v=| at V3=}]-l1 3 -1], 
-1 2 


The determinants are now 1, 3, and 8. The determinant of V, is Fyn. For proof we 
first subtract 1 from the (1, 1) entry of T, to reach U,. This reduces the determinant by 
det 7,-1 = Fon. Therefore detU, = Fon42 — Fon = Fon41. Now subtract 1 from the 
(n,n) entry of U, to reach V,. This reduces det U, by det U,-1 = Fz2n-1. So det V, = 
Pons] — Fon-1 = Fon. 

For the reader’s convenience we display four matrices T, U, V, W of order n = 3: 


3 -1 2-1 2 -l 2 -l1 
—] 3 -] —-1 3 -l —] 3 —l1 —] 3 -l1 
-1 3 —] 3 —-] 2 —] 1 
det T, = Fon+2 det Un = Fon+1 det V, = Fon det W, = Fon—2 


The matrix W, appears at the prediction step of the Kalman filter, before the observation 
row is included. At that point A has only a single “1” in its last column, so the last entry 
of W, is 1. Note that the order n is k + 1 in Kalman’s numbering, which starts at k = 0. 
Now we factor V, and W, into LDL". Since V, differs from W, only in the last en- 
try, their lower triangular factors L will be the same. L contains the number that multiplies 
row i when we subtract it from row i + 1. The pivots agree for V, and W,, until the nth 
pivot dyn. The determinants immediately give these pivots and multipliers: 
Fai+1 o dMi] 
Fai—ı Fai+1 
The last pivot dnn 18 Fzn/F2n—1 for the matrix V, and Fn—2/F2n—-1 for W,. The entries 
in DT! are the reciprocals of the pivots d;;. The really attractive formula appears in L~!. 
This inverse matrix is lower triangular with Fibonacci ratios: 


oe for i <n. (17.37) 


Cos e eee (17.38) 
Fzj— 
Thus for n = 3, the factorization V~! = LTD! L! is 
1 4773 ! 
w= 1 aif 3 [fda 
l giLs 5 1 


Every entry of V~! is positive. All the row and column sums are 1. This is a Markov 
matrix. 
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Now go through the steps of the Kalman filter. The nth state equation takes us from 
Vn—1 to Wn. Then the observation equation takes us to V,. Here are the updates: 


Fon- Fon- 
Last entry in wo! Pigs zie lS RE na 
Fon-2 Fn —2 
i = Fon— Fins 
Saan pan Kn = Pnjn-1(1 + Pajn-1) ` = F> 2 T T 
n— n— n 
Fonsi N Fon Fon- 
Last entry in Vi Pain = Cl — Kn) Pajn-1 = (1- 3 Na : = Z a 
n n— n 


The last entry agrees with 1/dnn as it should. The prediction X,),—1 is just Xp»—1|n—1 from 
our simple state equation. Then the correction is 


à Š Poy? x Fon-] 
Xn|n = (1 — Kn)Xn|n-1 + Knbn = a nn-] ar a bn. (17.39) 
Fan Fan 


In the totally steady case of equal measurements b, = b, this correctly gives all njn = b. 
The Kalman recursion (17.39) can be unrolled to see how b,_; is multiplied by F2,-3/ Fon: 


The explicit matrix inverses multiply b,-; by Fon—3/Fon—1 and then F2n—1/F2n. So 
Fən—3/ Fon İs correct: 


X1\n 


Xn|n 1 


The Gram-Schmidt Factorization 


We mentioned earlier an alternative to the LDL! factorization of ATA. We can orthogo- 
nalize the columns of A itself. This Gram-Schmidt procedure leads to A = QR, where 
R is upper triangular because of the order of the steps (columns are subtracted from later 
columns). In the application to Kalman filtering, only neighboring columns of A have 
nonzero inner products. Therefore R has only two nonzero diagonals, the main diagonal 
and the one above. All further entries of R are zero, because column j is already orthogo- 
nal to column k for |j — k| > 1. 

Our convention will have diagonal entries r; = 1. The entries of R = LT and 
its inverse were computed in (17.37) and (17.38). So it only remains to find Q (with 
orthogonal but not orthonormal columns). Once again these columns contain ratios of 
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Fibonacci numbers! We display A = QR withn = 3 columns: 


1 
I 
sd 


me NIN N= dle 


MIM” AIA CAN Cal Gil 


The fractions in Q are F;/F2j;~, fori < 2j — 1. The orthogonality of columns 2 and 3 
depends on 17 + 1? + 2? = 2-3. The orthogonality of columns j and j + 1 requires 


FR + Fi +e + Fij = Poj- Poj. (17.40) 


For proof by induction, add FS, to both sides. The right side becomes (F2;~1 + F2;) F2j = 
F2; F2;+1 which completes the induction. It is pleasant to see the Fibonacci ratios in Q. 

The Kalman equations can be derived from A = QR instead of A = LDL". This 
brings to light new orthogonalities, involving the innovations. Without developing the 
recursive part (the essence of Kalman), A! AX = ATb gives 


Al(b—Ax)=0 and Q'(b—AX)=0. 


The columns of A are orthogonal to the innovations b — Ax. So are the columns of Q. 
(The equations above differ only by an invertible matrix RT.) In stochastic terms we have 
zero correlation. The next section approaches the Kalman equations through LDLT. 


17.4 Derivation of the Kalman Filter 


All authors try to find a clear way to derive the formulas for the Kalman filter. Those 
formulas are certainly correct. When we look at the reasoning already supplied by earlier 
authors, it looks much too complicated. Often we don’t completely understand it. There 
is a definite feeling that there must be a better way. This produces almost as many new 
explanations as new books. 

The present authors are no exception. We will base all steps on two matrix identities. 
Those identities come from the inverse of a 2 by 2 block matrix. The problem is to update 
the last entries of (ATE -l A)T, when new rows are added to the big matrix A. Those new 
rows will be of two kinds, coming from the state equation and the observation equation. 
A typical filtering step has two updates, from Æg—1 to Sx (by including the state equation) 
and then to A, (by including the observation equation): 


Ao 
A 
Ao ae ; -F I 
Aj=|—-Fo I > = : > AQ= Aj 
Ay Al af, d 
— F; I 
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The step to $g adds a new row and column (this is bordering). The second step only adds 
a new row (this is updating). The least-squares solution X,—1)%—1 for the old state leads to 
Xkik—1 (using the state equation) and then to £z; (using the new observation bx). 
Remember that least squares works with the symmetric block tridiagonal matrices 
f= Ald k I Ax. So we need the change in J;_; when A,_ 1 is bordered and then updated: 


Bordering by the row R=[0 ... —Fy-1 I] adds R™XZ,R (17.41) 


Updating by the row W=[0... 0 Ax] adds W'D>) W. (17.42) 


We are multiplying matrices using “columns times rows.” Every matrix product BA is the 
sum of (column j of B) times (row j of A). In our case B is the transpose of A, and these 
are block rows and columns. The matrices Bey and 2. appear in the middle, because 
this is weighted least squares. 

The complete system, including all errors up to eg and ex in the error vector ex, is 
Ayre = bk — eg: 


Ao bo €o 
— Fo I Xo 0 €] 
Á2xX72 = Aj xij} = 1b, | —|e. | = 62-2. (17.43) 
— F] I x? 0 €? 
A2 bz e 


—1 
de? 


Our task is to compute the last corner block Py, of the matrix (ALE k l Ar). We do that 
in two steps. The “prediction” finds Pjg—1 from the bordering step, when the state row is 
added to A,_1 (producing $x). Then the “correction” finds Pjg when the observation row 
is included too. 

The second task is to compute the prediction X,),—1 and correction X,), for the new 
state vector xg. This is the last component of £. If we compute all components of £x, 
then we are smoothing old estimates as well as filtering to find the new Xk\k- 

Naturally £ affects all the update formulas. The derivation of these formulas will be 
simpler if we begin with the case & = 7, in which all noise is “white.” Then we adjust the 
formulas to account for the weight matrices. 


Remark 17.1 Our derivation will be systematic. Probably you will not study every step, 
but you will know the underlying reasoning. We compute the last entries in (A! x7! A)! 
and then in (ATETA)! ATELE. These entries are Pik and Xx\%. Block matrices will 
be everywhere. 


560 17 Kalman Filters 


There is another approach to the same result. Instead of working with ATA (which 
here is block tridiagonal) we could orthogonalize the columns of A. This Gram-Schmidt 
process factors A into a block orthogonal @ times a block bidiagonal R. Those factors are 
updated at each step of the square root information filter. This QR approach adds new 
insight about “orthogonalizing the innovations.” 


Block Matrix Identities 


The key formulas give the inverse of a 2 by 2 block matrix, assuming T is invertible: 
5 
T U L M 
EAA 0744 


Our applications have symmetric matrices. All blocks on the diagonal are symmetric; the 
blocks off the diagonal are U = VT and M = NT. The key to Kalman’s success is that 
the matrix to be inverted is block tridiagonal. Thus U and V are nonzero only in their last 
block entries, and T itself is block tridiagonal. But the general formula gives the inverse 
without using any special properties of T, U, V, W: 

L=T! +T UPVT! 

M =-T'UP 

N=-PVT 

P =(W- VTU). 


(17.45) 


The simplest proof is to multiply matrices and obtain J. The actual derivation of (17.45) is 
by block elimination. Multiply the row [T U]by VT~! and subtract from [V W]: 


I ofr U T U 
re ite A PE | ee) 


The two triangular matrices are easily inverted, and then block multiplication produces 
(17.45). This is only Gaussian elimination with blocks. In the scalar case the last corner is 
W — VU/T. In the matrix case we keep the blocks in the right order! The inverse of that 
last entry is P. 

Now make a trivial but valuable observation. We could eliminate in the opposite 
order. This means that we subtract UW7! times the second row [V W ] from the first 


row[T U]: 
I -UW ]|[T U] [T-uUwv 0 
0 I v wj- V wj 


Inverting this new right side yields different (but still correct) formulas for the blocks L, 
M, N, P in the inverse matrix. We pay particular attention to the (1, 1) block. It becomes 
L = (T — UW~!V)~!. That is completely parallel to P = (W — VT~!U)71, just 
changing letters. 

Now compare the new form of L with the form in (17.45). Their equality is the most 
important formula in matrix update theory. We only mention four of its originators. 


17.4 Derivation of the Kalman Filter 561 


Sherman—Morrison—Woodbury-Schur formula 
(T-UW V) =7T!47"!ujw-_-vrT-'u) vr. (17.47) 


We are looking at this as an update formula, when the matrix T is perturbed by UW7'V. 
Often this is a perturbation of low rank. Previously we looked at (17.45) as a bordering 
formula, when the matrix T was bordered by U and V and W. Well, matrix theory is 
beautiful. 


Updates of the Covariance Matrices 


We now compute the blocks Px), and Px), in the lower right corners of (Si S_)7! and 
(Al Ag) L. Then we operate on the right side, to find the new state xix. 


Remember that S comes by adding the new row [V I] =[0 ... —Fk-1 Il. 
Then Al Akt grows to STS, by adding [V I i V I: 
Al Ag- +VIV vi 
STS; = k-1 E 7 | (17.48) 


Because V is zero until its last block, equation (17.48) is really the addition of a 2 by 2 
block in the lower right corner. This has two effects at once. It perturbs the existing matrix 
T= Al Af—1 and it borders the result Tanew. The perturbation is by VTV., The bordering 
is by the row V and the column V! and the corner block J. Therefore the update formula 
for T4 goes inside the bordering formula: 

Updateto T+V'YV: Tj =T! -TIVTU +VT VVT! 


Border by V, VT, I: P = (I - VT v5. 


Substitute the update into the bordering formula and write Z for the block VT~!V!. This 
gives the great simplification P = I + Z: 


zj 
P =(1-v(T7 -T7 vTU + 2)'vT)v") 
=(I-Z+Z1+Z)'Z)' =14+Z. (17.49) 


This block P is Pkįķ—1. It is the corner block in (SESK). The row [V_ ZI] has been 
included, and the matrix Z is 


0 
vT'vi= [ 0 ga —Fy_1 | : : . = Fy Pe—-1jk—-1 Fl}. 
Pr-ik-1} |- F}; 
Therefore P = I + Z in equation (17.49) is exactly the Kalman update formula 
Pkik-1 =I + Fy—1 Pe—1ye-1 Fl}. (17.50) 


The identity matrix J will change to Xe , when the covariance of the state equation error €g 
is accounted for. 
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Now comes the second half of the update. The new row W =[0 ... 0 Ag] enters 
from the observation equations. This row is placed below Sx to give Æg. Therefore WTW 
is added to SES, to give Al Ak. We write Y for the big tridiagonal matrix ST Sk before this 
update, and we use the update formula (17.47) for the new inverse: 


(ALA! = (Y+ WUW! =Y -ywa wy wD wy. (17.51) 


Look at wy wt, The row W = [0 ... O Ax] is zero until the last block. The 
last corner of yo! is Pxjx—1, found above. Therefore wy! W! reduces immediately to 
Ax Pe\e—1 Aj}. Similarly the last block of ¥~!W7 is Pyj,—1 AT. 

Now concentrate on the last block row in equation (17.51). Factoring out ¥~! on the 
right, this last row of (Aj Ag)! iS 


(I = Peer AL + Ax Prje—1A,) | Ax) (last row of ¥~!) 
= (I — Ky Ag) (last row of (S$ S)~!). (17.52) 
In particular the final entry of this last row is the corner entry P,), in (Aj Ag)! 
Pri = (I — Kk Ag) Pkik-1. (17.53) 


This is the second half of the Kalman filter. It gives the error covariance Px\x from Pkjk—1. 
The new P uses the observation equation bg = Ax, + ex, whereas the old P doesn’t 
use it. The matrix K that was introduced to simplify the algebra in (17.52) is Kalman’s 
gain matrix: 


Zi 
Ke Pee Aj (1 + Ax Prjk—1Aj) À (17.54) 


This identity matrix J will change to Xe ą when the covariance of the observation error is 
accounted for. You will see in equation (17.63) why we concentrated on the whole last row 
of (Aj An) |, as well as its final entry Px), in (17.53). 


The Gain Matrix and the Weights 


In a moment we will complete the filter by estimating the new state vector xg: 
Prediction: Xkik—1 = Free Xk—1)k-1 
Correction: Ekik = Xeje—1 + Ky (De — AkXk|k—1): 
We do not put boxes around those formulas until they are derived. But we want to show 
that these state estimates are consistent with the prediction P,j,—1 and correction Pkję for 
their error covariance matrices. 
The prediction X;),—1 is easy. Actually it could come before or after Pxjx—1; you 


will see that the reasoning is very straightforward. The point to note is that by the ordinary 
propagation law, the state xz = Fk—1Xg—1 + €x has the error covariance matrix 


Pkjk-1 = Fg—1 Pk-ik-1 Fl] + Lek: (17.55) 


This is our formula (17.50), corrected by including Xe x instead of the identity matrix /. 
We could also have included Xe in our first derivation of (17.50). The update matrix 
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added in equation (17.48) would become [V I JTS- V I]. The long matrix (17.49) 
changes to 
_ (sl ~lon—l -1 -1 75-1)-! 
P(E 7, Ea D EF ZE) 
And this happily reduces to Z + Łe g which is the covariance (17.55) with weight matrix 
included. 
Now turn to the corrected state estimate kı = (I — Kg Ag) Xxjx—1 + Kgbk. Again 
we use the ordinary propagation law to find the variance of Xx)x: 


Piik = (I — Ky Ak) Pkjk-1 (1 — Ke Ag)" + Ke Zee Ky- (17.56) 


This is a new form of the covariance update. Unlike equation (17.53), this “Joseph form” 
is clearly symmetric. It is a sum of two positive definite matrices (so that property is 
never lost by numerical error). The computation is a little slower, but the form (17.56) is 
preferred in many calculations. Kalman and Bucy noted that a perturbation AK in the gain 
gives only a second-order change in the Joseph form, while it gives a first-order change 
—AK Ax Pkik-1 in the simpler form of equation (17.53): 


Pik = (I — Kk Ag) Pkik-1. (17.57) 


The equality of those two forms of Pig must come from manipulation of the expression 
for Kalman’s matrix Kg. So we turn to that gain matrix. We also insert the correct weight 
matrix Le, into its formula. 


We introduced the matrix Kg in (17.54) to simplify equation (17.52). If the covariance 
matrix Xe k had been included in (17.52), then it would have been included in the gain 
matrix: The correct form is 


-1 
Kp = Pret Ag (Zee + AkPkik-14%) - (17.58) 


The reason is that the first identity J in (17.51) should have been £, ,. Then the second 7 
would have been Xe... Now we manipulate that expression (17.58). “iti is equivalent to 


Prjk—-1Ag = Kg (Zee + Ax Pkik-1 Aj): (17.59) 
Shifting terms to the left side this becomes 
(I — Ky Ag) Peet Ap = Kg Eek. (17.60) 
Now substitute (17.60) into the Joseph form (17.56) for Pkk: 
(17.56) = (I — Kr Ag) Pee—1 — (17.60) ] Ky + Kr Dee Kp = (I — Ke Ax) Pkjk-1. 
This is the “unsymmetric” form (17.53) of Pix. But that product Pig must be symmetric! 


Note The gain matrix K also has an important optimality property. The state update 
formula (J — Ky Ax)Xxjk—1 + Kxbx led directly to the Joseph form (17.56) for Pgz. We 
could choose the gain matrix in that update so as to minimize Px\x. Since Gauss chose 
weights for the same purpose, the gain matrix K brings Kalman into agreement with Gauss. 
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Recall from Section 9.4 the proof by Gauss that the covariance of x is smallest when 
the weight is &~!. A similar proof shows that Kg minimizes (17.56) because it satisfies 
(17.60). The first-order change in Px), due to AK in (17.56) is 


(I — KA) Peje-1(-APAK)) +(...)' =0. 


This is exactly the comment made earlier, that a perturbation A K produces only a second- 
order change in Pki. When we examine that change A P = AK (A} Pkk- Ap+Ze,-)AK* 
we see that it is upwards (positive definite). So Kalman’s gain matrix Kg does minimize 
the covariance P;\,. This is perhaps the neatest proof of the Kalman update formulas. 

We note that the quantity most often minimized in the Kalman filter literature is the 
trace of P. This is the expected value of the scalar (x — x)! (x —£); its minimization leads 
to the same gain matrix K. But it seems more informative (and just as easy) to minimize 
the matrix P = E{(x — £)(x — £)"}. We just proved that any other choice of K (that is, 
any movement AK ) would increase P by a positive definite matrix AP. 


Now we systematically derive the state updates x, )z—1 and Xx as solutions of the overall 
least-squares problem. And we mention that all our derivations could be expressed in 
stochastic terms, with expectations instead of matrix equations. 


State Updates x;),~-1 and Xx |x 


The left side of the big least-squares problem is now dealt with. We know the last row 
of (ATE 4,)~!. Now we multiply by the right side ATE! bg, when bx includes the 
newest observation Dx. 

The predicted value of xg (before that observation) is simple to understand. Only the 
State equation has been added to the system. We can solve it exactly by 


Kkjk—1 = FR—1Xk—1)k-1- (17.61) 


This is our best estimate of x, based on the state equation and the old observations. It 
solves the new state equation exactly, and it keeps the best solution to the earlier equations. 
So it maintains the correct least-squares solution, when the new row and column are added 
to the system. 

Now we include the new observation. This changes everything. The earlier estimates 
Xijk—-1 are “smoothed” in the new X;\z. We leave those smoothing formulas (for i < k) 
until later. The predicted x,),—1 in (17.61) changes to a corrected value gx. This is what 
we compute now. It is the last component of the weighted least-squares solution to the 
complete system A,x, © bg: 


Ao bo 
—Fo I 0 
Aj *0 bı 
ee - | & : =b (17.62) 
Ak-1 X4 by—| 
—Fy,-; I 0 
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Table 17.2 The equations for the Kalman filter 
System equation Xe = Fk-1Xk-1 +E, €k ~ NO, Le x) 
Observation equation by = AkXk + €g, ek ~ N(O, Le x) 
Initial conditions E{xo} = Xo 
E{(x0 — £010) (xo — £010)'} = Pojo 


Other conditions E (exe; } =0, forall k, j 


Prediction of the state vector Skai = Fysak 


Prediction of the covariance matrix | Pkjik-1 = Fk-1 Pk-1 pis + Lek 


The Kalman gain matrix Kg = Pr\e—1Ag (Ak Pkik-1 Ag a Dek) 


Filtering of state vector Ekik = Xeje—1 + Ke (De — AgXk|k-1) 
Covariance matrix for filtering Pik = (I — Kk Ak) Paik 


The least-squares solution is always X = (AP x! Ay) API by. We want the last 
block Xk ik in this least-squares solution. So we use the last block row of (ATELA)! 
from equation (17.52): 


Ree = (last row of (AE Ak) !) Ap ET by 
= (I — K,Ag)(last row of (S ETIS ) Ap Ube. (17.63) 


We start with bg on the right side, and carry out each multiplication in this equation. 
Separate the old observations in 6;,—1 from the new bz, and multiply by £7 !: 


E-l bki 
= b, — k i | : 
Le Ok 


Multiply next by A; =[S} WT] and recall that W =[0 ... 0 Ag]: 
ALD by = 86 Dy) be-1 + Ag Lz De. (17.64) 


Now multiply by the last row of (Si, ~~! S,)~!. This produces the least-squares solution 
Xx|k—1 in the old k — 1 part. Watch what it produces in the new part: 


(last row of (Sp B'S)" ) (Sp Dy Set ai AR E, bx) = Śkik-1 + Pkik-1 4k Dg gbr. 
(17.65) 


Finally equation (17.63) multiplies this by (Z — Kg Ax) to yield X4)4: 
Xklk =(= Kk Ag) Xk\k-1 + Kx bx. (17.66) 


That final term used the identity (17.60) to replace (J — Kr Ax) Pejk—-1Aj by Kx ek. 
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This completes the sequence of Kalman filter update equations. As we hoped, for- 
mula (17.66) for X;)% can be expressed as the prediction X;|,—1 plus a correction: 


Ekik = Xele—1 + Ky (Oe — AgXk|k-1). (17.67) 


The correction is Kalman’s gain matrix times the innovation. 


17.5 Bayes Filter for Batch Processing 


The steps of the Kalman filter are prediction-gain matrix-correction. For the state vector 
this is the natural order. From the prediction X,),—1 and the gain matrix Kg and the new 
observation bg, we compute the filtered estimate gix. This uses bg from the right side of 
the normal equations. But the hard part (the computationally intensive part) is always on 
the left side, where we are factoring the big matrix ATE! and updating the covariance 
to Pix. The Kalman filter creates the gain matrix K before the covariance P, but this is 
not the only possible order. We want to consider if it is the best order. 

Suppose we compute Px\, before the gain matrix. Then we use the gain matrix 
for xik. This is the one point where K is actually needed, to multiply the innovation 
b — Ax and update the state vector in (17.67). The new observation is that we can go 
directly from the prediction Px|,—1 to the correction Pg: 


1 


Pre = (Page, + Ak Ezp Ak) - (17.68) 


This is straightforward when we remember that the matrices P~! are the (block) pivots of 
ATELA. The new row [0 ... 0 Ag] simply adds Aly Ak to the (k, k) block and 
therefore to that pivot. This is equation (17.68). | 

P—' is called the information matrix. It increases as the covariance P decreases 
(then the information gets better). It is sometimes more economical to work directly with 
the pivot, which is P—!, than with P. The inverse of (17.68) is 


—1 -1 Ty-! 
In either case the gain matrix (it now comes after P or P7!) has the new formula 
Ky = Pkk AR Ez}. (17.70) 


To verify that this is the correct gain matrix, remember that the least-squares solution is 
(ATETA) LATE! 6. The matrix multiplying by in the last block is Prik (from the big 
inverse matrix) times AT (from AT) times ay This is exactly formula (17.70) for the 
gain matrix. 

To repeat: We can compute the covariance Px, before the gain. Morrison (1969) 
calls this the “Bayes filter,” although this name is perhaps not widely used. The formulas 
can be derived from the Bayes theorem about conditional expectation (just as the Kalman 
formulas can be derived from maximum likelihood). We could see the two forms more 
directly, as coming from the two sides of the Sherman-Morrison-Woodbury-Schur ma- 
trix identity. And we could see them in a deeper way as coming from dual optimization 
problems. 
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We now focus on the practical questions. Which form is more efficient? Which form 
is more stable? Since the Kalman form is the most famous and frequently used, we expect 
that it usually wins. But not always. 

The computational cost is greatest when we invert matrices. The Bayes form in- 
verts P, whose order equals the number of state variables. The Kalman form inverts 
Zek + Ax P| pa whose order equals the number of new observations. If frequent 
views of the state vector are required, then Bayes is expensive and Kalman is better. 

The Kalman form is best for immediate updates. The Bayes form has advantages 
for batch processing. If we want to update the state as soon as possible—to give a better 
prediction in a nonlinear state equation, or to reach a quicker control decision—then we 
need Kalman. If we can collect a larger batch of observations before using them to update 
the state, we can choose Bayes. The gain matrix is only needed for the state update. 

Let us mention some other advantages of the Bayes form (covariance before gain). 
The significance depends on the particular application: 


1 The Bayes filter can start with no a priori information on Xo, by setting Poo s0 


In the Kalman form this would require Pojo = oo. Substituting a large initial vari- 
ance is certainly possible, and very common. Morrison (1969), page 472, discusses 
the difficulties with this approach; it affects the later estimates, which does not hap- 
pen with Poo = 0 in Bayes. And there is a further difficulty to tune the large Pojo to 
the word-length of the computer. 


2 If the covariance matrix P becomes too small, there are again numerical difficulties. 
(It may become indefinite, for one.) And Kalman updates run into trouble when the 
data is highly precise and its covariance XL, g is nearly zero. This matrix could be 
lost to roundoff error when Ye k + Ax Prje—1 AT is inverted. The Bayes form remains 
successful provided Ax has full column rank. Its new estimate uses heavily the new 
data, which is reliable because Xe x is small. 


Recall that the square-root filter was created to avoid the worst of these possible 
difficulties. The positive-definiteness of a matrix P is guaranteed if we express it 
in the form RTR. The reason is that x! R! Rx could never be negative, because it 
equals ||Rx||?. So the numerically stable approach to filtering is to use a “square 
root” of P or P7!, 


Note that A = QR gives ATA = RTR. Thus the factorization into QR by Gram- 
Schmidt orthogonalization is the entrance to the square root filter. But the overhead 
of working with this Q matrix is serious! So the square-root approach is more ex- 
pensive (and may not be needed), compared to plain Kalman. 


And we must also note disadvantages of Bayes: 


1 The number of unknown states (the dimension of xz) is fixed. We cannot conve- 
niently account for a satellite that rises or sets. If a parameter becomes unimportant 
during the processing, we have to stay with it. 


2 For quick updates, Bayes is more expensive. 
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Both filters must run into difficulty (and they do) if observation errors are perfectly 
correlated. In this case Łe x is singular. Bayes requires its inverse, so at least it identifies 
the problem immediately. Kalman proceeds forward, without trying to invert, but it finds 
singular covariance matrices Pkjg. It is this danger (and also near singularity) that the 
square root filter or the Deyst filter, Deyst (1969), can handle better. 

We don’t mean to be such pessimists! The Kalman filter gives state estimates (and 
position estimates for GPS) of high accuracy in a wide variety of important problems. It 
reduces the error while it computes the covariance. Those are both extremely valuable in 
practice. 


Remark 17.2 After writing these sections we discovered an interesting paper by Duncan 
& Horn (1972). That paper was written to overcome a “communications block” between 
engineering and statistics. We paraphrase their introduction: Despite a basic closeness to 
regression and a high utility for providing simple solutions, little of Kalman’s work has 
reached the average statistician—because it was developed from conditional probability 
theory and not regression theory. 

Duncan and Horn verify the Kalman filter equations and establish the key property 
that E{(x — Xx, k)bT} = 0. Their argument extends the Gauss-Markov Theorem from 
regression theory, which identifies optimal estimators. Their reasoning will be familiar 
to statisticians. To a non-expert in both fields (wide sense conditional probability and 
regression), the latter approach is simpler but still far from transparent. We hope that our 
straightforward and more ponderous derivation (slogging through the normal equations) 
will be helpful to a third group. We do not attempt to define this third group, who prefer 
block tridiagonal matrices to subtle and useful insights about uncorrelated variables. 

The common step in all derivations is our favorite matrix identity: 


Pak = (T + A'E +A) t = T7! — TAT (E. + ATIADIAT™!. (17.71) 


Duncan and Horn choose the Bayes form as more intuitive. That form uses the left side 
of (17.71), with T = Ba as the predicted inverse covariance. The Kalman form uses 
the right side of (17.71). You see again how Kalman inverts a matrix of size given by the 
number of observations, where the left side involves the number of state components (just 
count rows in A and AT). 


Remark 17.3 Paige & Saunders (1977) have proposed a Kalman algorithm based on or- 
thogonalization of the columns of the big matrix M = AX~'/?. By including D7!/2, 
which is found one block at a time since & is block diagonal, they can simplify to unit 
covariance. The main point is to create @ from a sequence of rotations so that 


T i R j ier 

Q M = rM = and then x=R Qib. 
Q3 0 

The factor R is block bidiagonal. Its last block Ry, reveals everything about P; |, and XkIk- 


Because Q is orthogonal and R is triangular, the overall covariance matrix P and its last 
block Px, are given by 


P=(R'R)"! and Pre = Ri Rek. (17.72) 
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Thus R is the Cholesky factor (“square root”) of the “information matrix” P! and Riz 
is the Cholesky factor of Pay: This is a square root information filter. By working 
with Q and R it is numerically stable; see also Bierman (1977). The Paige-Saunders 
constructions are a little slower than necessary, so today’s implementation is different. But 
for the third group described above, who understand matrix algebra better than statistics 
and probability, this paper will be helpful. 


17.6 Smoothing 


The forward process of filtering produces the values £z. These are the best state estimates 
from the observations up to time k. Smoothing produces a better estimate x,y for the state 
at time k by using the observations up to the later time N. 

Actually, filtering is forward elimination and smoothing is back substitution. Both 
are to be executed recursively. The normal equations have the block tridiagonal matrix J 
as coefficient matrix. As always, the steps of elimination factor that matrix into lower 
triangular times upper triangular. There will be a block diagonal matrix in the middle 
(containing the pivots), if L and LT have Z as their diagonal: 


I Po! I Lijo 
JT =LDL! mies 
7 7 i Dene 
Pe N|N-1 
Ly\n-1 I I 
(17.73) 


This factorization combines in each row the “double step” of prediction followed by cor- 
rection (bordering by the state equation and update by the observation equation). We look 
at the net result in these simple terms: 


1 The off-diagonal block Lgjķ—1 gives the off-diagonal block T,),-1 in T: 
Directly from (17.73): Lkk-1 P4 = Tkjk-1- 


2 The diagonal block Pr’ completes the diagonal block Tkig in T: 
Directly from (17.73): Lije—1Py_ Lp g-1 + Pp = Tijk: 


This direct computation of the pivot Pr’ is what we call the Bayes filter. The Kalman filter 
computes the inverse pivot Py. The steps become completely explicit when we write the 
blocks Ty jz—1 and Tj), in terms of Fy—1 and A, and a and ae What we are interested 
in now is the solution £ to the normal equations. Let us focus on that. 

The right side of the normal equations is a long vector v = ATE -1 b. The equations 
themselves are just 7X = v. The point of elimination is to break this into two triangular 
systems, and the forward filtering algorithm puts the pivots po into the lower triangular 
factor: 


Forward filtering: Solve LDX¢éitered = V 


Backward smoothing: Solve | a emer = Xéitered- 
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Again, this is forward elimination and back substitution. The intermediate vector {filtered 
contains the estimates X,),. Then £ = Xsmoothed is the actual solution to TX = v. 
Now we look at smoothing algorithms. Row k of the equation L'£ = Xgterea is 


Ikn + Lipi geht ln = Îkjk- (17.74) 


Thus back substitution is a backward recursion that starts from yy, which is the last 
output from forward filtering: 


Ein = Fe — Lig eer = Fee — Pg Teg ekerinv- (17.75) 
This is the RTS recursion of Rauch & Tung & Striebel (1965), equation (3.30). Those au- 
thors noted that the pivots could be stored or else computed recursively by a backward filter. 
We emphasize that Px is the covariance matrix for the filtered (forward) estimate Xk x. The 


covariance matrix for X,;j) will be smaller, since more information has been used. The 
covariance matrix for the complete system is always (A'X7!.A)~!. 


Example 17.3 We bring a numerical example from the original paper by Rauch & Tung 
& Striebel (1965). Consider a dynamical system with four state variables: 


1 1 05 0.5 
0 1 1 1 
Xk = Fk-1Xk-1 + €k = 001 0 Xk—1 + €k. 


0 0 O 0.606 


Suppose bk = [1 O O O]ļ]xg + e is the 1 by 1 output vector that measures xı (with 
noise). The errors €g and eg are independent Gaussian with covariances 


0000 
0 000 

Ae k= 0 0 0 0 and ek — l. 
0 00e 


The initial condition Xo is a Gaussian vector with covariance given by Pojo. 

The entire system is a linearized version of the in-track motion of a satellite traveling 
in a circular orbit. The satellite motion is affected by both constant and random drag. The 
state variables x1, x2, and x3 can be considered as angular position, velocity, and (constant) 
acceleration. The state variable x4 is a random component of acceleration generated by a 
first-order Gauss-Markov process. 

Three cases will be considered, with different choices for € and Pojo: 


0 0 0 


1 
0 
1 € = 0.0063 and Pojo = 0 
0 


O O = 


2 €e = 0.000063 and Pojo = 


O O O m| 
O O- © 
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Variance of x 


0 5 10 15 20 25 
Observation k 


Figure 17.1 Variance history of xı for different values of € and Pojo. 


100 0 00 
3 €=0.0063 and Pyo= oo rae 
0 0 0 001 


In each case N = 25 measurements are taken starting with bı. In Figure 17.1 the variances 
of the filtered and smoothed estimates of the state variable x; are plotted for all three 
cases. Smoothing the estimate decreases the errors. Reducing the variance of the random 
disturbance reduces the variance of the estimates. The effect of initial conditions (the a 
priori information about the state) rapidly dies out. 

The code generating Figure 17.1 is named rts. 


Backward Filter 


Smoothing at epoch k involves data from before and after epoch k. The accuracy generally 
is Superior to that of an unsmoothed filter because the estimate involves more observations. 
It is easy to imagine a filter working on the data from the last epoch N “backwards” to 0. 
So one could hope that a weighted mean of the forward x, and backward x filters yields 
an optimal smoother: 


Xk\N = Apx, + (I — Ax)X;- (17.76) 


The weighting matrix A, is still to be determined, by minimizing the covariance Px\n of 
the estimation error. This gives a new approach to smoothing. 
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Let Pk be the covariance matrix for the forward optimal filter error and P% the co- 
variance matrix for the backward optimal filter. The law of covariance propagation yields 


Pein = E{ ve ng} = Ak PAF + U — Ag) PEC — Ap)". (17.77) 


We determine A = Ax so that the trace of Pky becomes a minimum: 


ð 
a Prin) = 2AP + 2X(I — A)P*(-1) = 0. 


This leads to 
A=P*(P+P*)! and J—-A=P(P+ P*)7!. (17.78) 
Substituting into equation (17.77), matrix algebra gives the neat result 
Poy = Pe + (POT. (17.79) 


From this equation follows Pky < Pk. The smoothed estimate of xj, is always better than 
the filtered estimate, when we consider their variances. 
Next we insert the expressions A and J — A into equation (17.76) and simplify: 


A = —1\-1 —] a —1 a 
Xen = (Poh (PO) (Polke + (PDE). (17.80) 


Equation (17.80) is a matrix generalization of the scalar equation (9.53) for weighted 
means. The same equation is also the basis for all expressions for smoothing estimates. 
But back-substitution is the faster way to evaluate £+] y. 

Usually one distinguishes between three types of smoothers: 


— fixed-interval: Smoothing of data from epoch 0 to N; we seek x; from 0 to N. 
- fixed-point: Smoothing from 0 to an increasing N; we seek £; for a fixed k. 
- fixed-lag: Smoothing from k — n tok as k increases; n is fixed and we seek X;—p. 


The first type is only for post processing. The last two types may also be used in real time. 


Fixed-interval smoothing Given observations from epochs 0 to N, we filter forward and 
keep the results £kik—1, Fkjk, Ekjk-1, and Ezik. Next we filter backward from epoch N 
to 0 by these recursive formulas (starting from N — 1): 


Xkin = Xkik + An (Kesipw — Xe41)k) (17.81) 


Ak = Erk Fg Deeg (17.82) 


Example 17.4 Equations (17.81) and (17.82) are coded as the M-file smoother. Fig- 
ure 17.2 shows the result of a forward filtering and a smoothing of the filtered values. (The 
call was smoother(1,2), this means that £e = 1 and E; = 2.) The data describe the drift 
of a steered clock in a GPS receiver. The second graph shows the variances of the filter and 
the smoother. The variance curve for smoothing is minimum in the middle and then gets 
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Figure 17.2 Forward filtering and smoothing of receiver clock offset. The epoch interval 
is 15 seconds. Notice the reduction of variance by smoothing. 


larger as either end point is approached. This indicates that the best estimation occurs in 
the interior region where there is an abundance of observation data in either direction from 
the point of estimation. This is exactly what we would expect intuitively. 

We note the steady state behavior in the middle of the plot. This happens often when 
the data span is large and the process is stationary. In the steady state region, the gains of 
the filter and smoother and error variance are constant. The calculations are fast and the 
storage is greatly reduced. Sometimes a problem that appears to be quite formidable at first 
glance works out to be feasible because most of the data can be processed in the steady 
state region. 

Often the variance of the innovation due to the predicted state is tested. If it is less 
than a tolerance, the measurement is thrown away since it will have no effect anyway. 
There is no need to activate the filter update. 


Fixed-point smoothing 


fkin = Xein—1 + By (nin — XnN-1) (17.83) 
N-1 

By=[[S. Si = Dip FB, (17.84) 
=k 


with initial value Xkhk = x,forN =k+1,k+4+2,.... 


Fixed-lag smoothing 


A A —Í — A A 
Xk+1|k+1+N = FkX¥kik+N + Eel FE) Die (Ĉkk+N — Xx\k) 
+ Bk+1+N Ketitn (bk+1+N — Ak+14N Few keen ken) (17.85) 


k+N 
Bk+1+N = I] Si, Si = eh a: (17.86) 
i=k+1 


for k = 0, 1, 2,... and ojx is the initial condition. 
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17.7 An Example from Practice 


The filters described so far have been of a simple nature. We now describe an example 
from daily life. It is a model of the 2-dimensional motion of a vessel. Our model is partly 
based on Tiberius (1991). 

Circular motion is characterized by a constant speed v and a constant accelera- 
tion a in the direction perpendicular to the motion. Let x, and yg designate the x- and 
y-coordinates at time tz. If æg is the heading, the discretized equations of motion are 


Xk = Xk—1 + Sim Æk—1 Atk Vk—-1 + 5 COS æk—1 (Atk) ak1 + cos akı (Ath) åk- 
bg 2 1: 3: 
Yk = Yk—1 + COS Œk—1 Ate Vk—-1 — 5 SiN Og (Atk) ak—-1 — z SiN æg—ı (Atk) ag_-1 


ak— 1 2 
Ak = Qk-1 + 5 Atk + z (AK), 


(17.87) 


where At, = tk — tk-1. We linearize this system and augment the state vector with the 
speed v and acceleration ag. Thus xk = (xk, Yk, Œk, Vk, ak). The three equations (17.87), 
with vg and ax constant, are xg = Fg—1Xxz—-1 +€. Here eg includes the error of linearization 
and 


1 0 f3 sing? At, 5 cos aP (Atg)? E 
0 1 f3 cos a At, -4 singl (Atg)? Yk—] 
Fye-1Xe-1=10 0 1 -a®At/(v®)? Atg /v® Op 
0 0 0 1 0 Uk—1 
0 0 0 0 1 ad 


We have introduced the following abbreviations 


fi = —(- cos ay? + 5 sina’ Atg a?) Afk 
fz = —(sin cy? + j cosa? At, a?) Atk. 
For œ, v?, and a? we use the estimates ĉg—] Ik-1s Uk—1]k—1, and âk—1|k—1- 


Now we make the model random by introducing small random fluctuations in the 
motion. Obviously, sudden large shifts in position and velocity are not likely. So we model 
small random fluctuations into the acceleration. The influence of these fluctuations on the 
state vector is described through Xe x. 


Covariance matrix for the system errors For system noise, the first step is to introduce 


5 sing? (Atg)? 7 cos g (Atp)? 
cosg? (Atg)? — sin al (At)? 
Gk = 0 (Atg)*/2v° |. (17.88) 
Atk 0 
0 Atk 
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Figure 17.3 Dynamic model of the motion of a vessel. If d*s/dt* < 0 we are dealing 
with a deceleration, and the tangent vector T changes direction. 


G x determines the influence of acceleration on the state vector xg = (Xx, Yk, Œk, Vk, Ak). 
The first column is for tangential acceleration (in the direction of motion). The second 
column comes from the normal acceleration (across the track). We shall describe this in 
detail. Any acceleration a can be split up into components along the track and across the 
track, cf. Strang (1991), page 462. Let N denote the normal vector and T the tangent 
vector and « the curvature: 


d’s ds \* 
= — T —] N. 
"=r +«() 


For a straight course the only acceleration is d*s/dt*. The term «(ds/dt)* handles the 
acceleration in turning. Both have dimension length/(time)*. The change in course is 


da = kds = Il oe. (17.89) 


|o] 


The acceleration across the track may have a linear change in time. 
Our procedure corresponds to the along track acceleration a; and the across track 
acceleration a being uncorrelated random variables with zero mean value: 


z Da, 0 = T 
See and = Deg = GeLwGT. (17.90) 


In pure determination of position only a part of the state vector is used, e.g. only x, and yx. 
The observation equation becomes 


1 0 0 0 0 
by = Ak Xk + ex with a=] 100 | 


Realistic values for the (uncorrelated) covariances in xg = (Xx, Yk, Œk, Vk, Ax) are (0.5 m)?, 
(0.5 m)*, (0.0001 rad)”, (0.6 m/s)”, and (1.5 m/s?)?. 
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Example 17.5 We return to the steady model of Section 17.3 (very steady with no error € 
in the scalar state equation). The observation errors are normally distributed with zero 
mean and variance oĉ. In other words we have 


System equation: eee Fg—; = 1, €=0 
Observation equation: bg = xk + ek Ag = 1, ep ~ N (0, a2). 
The covariance prediction is o$ k-1 = E kL" (We write o? for P.) The Bayes update is 


2 2 
Oklk—1 _ Oe—1|k—1 


Oe ep ae 2 oN) PEF) z 
Okik = Ok|k—1 Ore —1 (Okik-1 +02) Okiki = 1402, fo2 ogee Jo2 
k\k—1/%e k—1|k—1/ ľe 


Starting with ago = og we solve the difference equation recursively to get 


Be 0, (17.91) 
KM 1+ (08 /02)k 
Thus the updated state vector is 
2 
CO 
Xk\k = Xk|k-1 + 3 (bk — xkik-1). (17.92) 


1 + (a6 /02)k 
For sufficiently large k we have xkjik œ% xkįk-1; this agrees well with the fact that new 
observations do not contribute much further information. 


Example 17.6 Now we add an error ex to this steady model: 
System equation: Xk = Xk-1 + Ek 
Observation equation: by = xk + ek. 


The presence of the error term ég changes drastically the behavior of the filter. 

We use simulated data with normally distributed errors (uncorrelated). For demon- 
strational purpose the data are subdivided into epochs 1-25, 26-50, 51-75, 76-100, and 
101-125. The observation variance is o; , = 1 for epochs 1-75 and o? , = 100 for epochs 
76-125. The system variances are a? , = 100 for epochs 1-50 and of , = 0.5 for the rest. 

According to Table 17.2 the gain factor K (here a scalar) is 
Ofik-1 Ok -tk-1 + Fe 
a ee or ee 
Okik-1 Fee  Fk-ilk-1 Tek T Tek 
Ofk =(1-— Kx) (ofi +02) 


The gain factor Kg is only large when the observation variance o? , is small. Then the 


observation by is weighted heavily; this makes the predicted estimate depend primarily on 
the latest observation. Figure 17.4 shows the results of the filtering. 

Prediction curves are identical (because Fg- = 1) to the graphs of xk—1|k—1, except 
they are shifted one epoch. The calculation starts with x9 = 25 and 96 9 = 9.001. (If no 
information about xo were available we would set 06 9 = œ and consequently Ko = 1.) 

In epochs 1-25 the system variance o? , = 100 is large compared to a? , = 1. This 
causes the gain factor Kx to be close to unity which again implies that the current obser- 
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Figure 17.4 Filtering of a simulated steady model. The observations are marked with 
dots. The thick line indicates the filtered values x;—1),—1, the thin line the predicted values 
Xk|k—-1 = Xk—1|k-1. The two outer lines indicate a confidence band calculated from the 
predicted filter by adding and subtracting 20, x. Initial values x9 = 25 and ĝo = 0.001. 


vation bg determines the state x,. The filtering variance o% , equals unity and expresses 
the uncertainty of x;|,~1. The observational variance is small and the state values fluctuate 
quite a bit. All observations lie inside the confidence band. 

Epochs 26-50 have a? , = 0.5 and a; , = I. The Kalman filter has a wrongly 
specified value 100 for o? ,- The confidence band becomes very narrow. The gain factor 
drops to 0.5 as of g = 0.5 is slightly smaller than o y = 1. Now by and xgjķ—1 are equally 
weighted and the predictions are lagging one step behind the observed values. 

The predicted values are as if the state hardly changes. The wrong specification 
largely influences this prediction. An earlier state is weighted too heavily. A very large 
weight Kz for bg would be preferable due to the large uncertainty in xx. 

The filtering variance converges quickly to 0.5. The state estimates are more accurate 
than for epochs 1-25. The state variance of is smaller so the filtering variance op iS 
calculated as if the state is fairly constant, which obviously is not the case. 

Epochs 51-75 have o? , = 0.5 and o; , = | and the gain factor converges to 0.5. 
The predicted values are close to the observations. Due to low observational variance they 
vary only a little. Small system variances cause fairly constant state values. 
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Figure 17.5 Gain factor Kx and filtering variance o$ , for the model in Figure 17.4. 


Epochs 76-100 have o? , = 0.5 and a; ; = 100. This yields a small gain factor. The 
predictions are fairly constant with wide band limits because x;\,—1 is weighted heavily 
compared to bg. The predictions are not affected by the observations. The filtering variance 
converges to a large value in accordance with the values of , = 0.5 and o? , = 100. 

The state level becomes more and more uncertain as time goes by. The observations 
fluctuate largely and thus become bad for predicting the state level. 

Epochs 101-125 have ož, = 0.5 and of, = 1. In the Kalman filter o? , is wrongly 
specified to 100. The gain factor converges to 0.5. Due to the large value for o? , a small 
gain would have been preferable. This would have weighted the observation very little. 

Due to the wrongly specified variances for epochs 26-50 and 101-125 the bandwidth 
is too narrow in these periods leaving many observations outside the band limits. 

Figure 17.4 shows that when the Kalman filter is applied for a time series all ob- 
servations are situated within the 95% band of confidence. The band is determined by 
Xk\k + 24/qk where qg = Gps [ees Oe + oy The large values for a2. and Or, give 
rise to a large gx, making the confidence band wider around the observations (and more 
uncertainty). 

The M-file k_simul produces Figures 17.4 and 17.5. The example demonstrates the 
importance of correct variances when a filter is used for prediction and monitoring. 


17.7 An Example from Practice 579 


Remark 17.4 In most applications the observation vector by has a diagonal covariance 
matrix Łe. The individual components of bg are uncorrelated. Diagonal matrices give m 
scalar equations. So it may be advantageous to filter the components of bg as individual 
observations. Treating the problem this way leads to 


1 improved numerical properties. Usually the filter must invert Ak Xx |x—1 AT + Eek. 
This matrix inversion is replaced by m scalar divisions. 


2 improved computational time. The vector implementation time grows proportionally 
to m? whereas the row implementation has a growth proportional to m. 


The implementation involves an extra loop over the rows of the A matrix. We quote the 
essential part of the Kalman filter version: 


% Coefficient matrix for observations 
beta = (f1/f2)*2; Big = 10 ^10; 
A(1,:)=[1 1 0 OJ; 


for k = 1:number_of_satellites—1; 


b = [P1 Phil P2 Phi2]; % Double differenced observations 
x = []; 
x plus=... ; % Initialization of state vector 
Sigma_plus = eye(4) * Big; Q = Big; 
var = [.3^2 .005^2 .3 ^2 .005 ^2]; % Variance of observations 
for i = first_epoch:last_epoch % Kalman filter 
Sigma_plus(1,1) = Sigma_plus(1,1) + Q; 
Sigma_plus(2,2) = Sigma_plus(2,2) + Q; 
forj = 1:4 
X_minus = x_plus; 
Sigma_minus = Sigma_plus; 
A_row = Ajj,:); 
SigmaA_rowt = Sigma_minus * A_row’; 
Innovation_variance = A_row * SigmaA_rowt + var(j); 
K = SigmaA_rowt/Innovation_variance; 
omc = b(i,j)-A_row * x_minus; 
x_plus = x_minus + K* omc; 
Sigma_plus = Sigma_minus—K* SigmaA_rowt’; 
end; 
x = [x x_plus]; 
end 
end 


580 17 Kalman Filters 


Table 17.3 Kalman filter for point position 


innovation [m] —4.72  —11.82 0.33 3.15 —88.63 
X [m] 2.79 5.66 —4.15 14.77 51.68 


Y [m] —2.87 5.15 0.34 —6.89 -85.45 
Z [m] 1.67 9.59 18.12 30.89 139.55 
cdt [m] —1.30 —2.63 —2.06 70.92 372.72 
PDOP [] 1443.1 1046.6 311.7 4.8 4.1 


The row-versions of the Kalman and the Bayes filters are implemented in the M-files 
k_row and b row. 


Example 17.7 Example 15.7 used six pseudoranges to locate the receiver. The extra two 
pseudoranges were used in a least-squares manner to improve the calculation. 

With the filter technique at hand it is natural to find a filter based solution to the 
problem. We want to state the right side b; of the observation equation—or the innovation: 


innovation = (P — tropo) — (0; + c(dt; — dt*)). (17.93) 

For independent pseudorange observations, we update after each observation: 

function [x,P] = kud(x,P,H,b,var) M-file: kud 

% KUD Kalman update, one measurement per call 

innovation = b—H’ =x; 

HP = H’ =P; 

Innovation_variance = HP *H + var; 

K = HP’/innovation variance; 

x = x + K* innovation; 

P = P—K* HP; 
The data from Example 15.7 yield the results in Table 17.3. Note that the fourth satellite 


drastically decreases the value of PDOP. This is because the fourth satellite is necessary 
for a solution of X, Y, Z, and cdt. Afterwards PDOP decreases much more slowly! 


Example 17.8 A Bayes filter adds on to the normal equations one observation per call. 
The normals are only solved at the end of input. The following simple function is the core: 


function [AtA,Atb] = normals(AtA,Atb, H,innovation, var) M-file: normals 
% NORMALS One observation equation is added to the coefficient 
% matrix AtA and the right side Atb. 


Atb = Atb + H+ innovation/var; 
AtA = AtA + H*«H)’/var; 


The final change in estimates is (x, y, z, c dt) = (14.7, —60.0, 46.3, 112.3) with a standard 
deviation of 26 m. The implementation can also be studied in the M-file k_point. 
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Example 17.9 The least-squares problem in Example 11.11 was solved by the normal 
equations. It may as well be solved by means of a filter. The final ¥3;3 = x must agree 
with x from the normal equations. The code is contained in the M-file wc. 

The three observations have weights 1, 2, and 1 + dC 2. Below we quote results from 
the filter process (x denotes a large number). For 6C2 = 0 the final ¥3;3 = ¥ agrees with 
X = (2/3;17/3): 


2 
a Jl S e. NEDEN Ee 
R n=l X3/3 = H 
3 
ge we 2.2 5 

Pij = i 4 Pj= E i P33 = A 
2 2 6 


For 6C2 = —1 (which is the same as omitting the last observation) we find 


ale! Poe Pee E 
l=], Ne ee ea 
= 9 _5 9 5 
rn=(" i Pn =| f l | 5 di 
-3 32 =3. 2 


Finally for 5Cz = 00 we still have ¥3)3 = X (and notice the new P3)3): 

: 1 : 3 ` 2; 
ILS], x22 =|; %33 =] ; 
11 

og = t i 

Pin =| | P2 = l P33 k A 
* Ox = 3 A L 
2 2 M 


We finish this chapter by mentioning some M-files that solve earlier least-squares 
problems but now by means of Kalman filters. 


An important fact to note is that setting the measurement error equal to zero in a 
Kalman filter is the same as a constraint in a standard least-squares formulation. 


NIM NIVO 


Example 17.10 The M-file fixing? demonstrates all the computations in Examples 12.1, 
12.2, and 12.3 in a filter version. 


%FIXING1 Filter version of Examples 12.1, 12.2, and 12.3 


% Shows the impact on introducing a constraint with 
% zero variance for the observation 
big = 1.e6; 


A = [-10010;-10001;00-110;0-1001;0001-1]; 
b = [1.978; 0.732; 0.988; 0.420; 1.258]; 

Cov = eye(5); 

x = zeros(5,1); 

P = big * eye(5); 
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% Regular update 

fori = 1:5 
[x,P] = k_update(x,P,A(i,:), b(i), Covi(i,i)) 
pause 

end 


% Update with constraint with variance one 
A_aug = [1111 1]; 

b_aug = 100; 

Cov_aug = 1; 

[x,P] = k_update(x,P,A_aug,b aug,Cov_aug) 
Sigma = (norm(b—A *x)) *2 *P 

pause 


% Update with constraint with variance zero 
Cov_aug = 0; 

[x,P] = k_update(x,P,A_aug,b aug,Cov_aug) 
Sigma_plus = (norm(b—A * x)) * 2 * P 


Example 17.11 The M-file fixing2 demonstrates all computations in Examples 12.4, and 
12.7 in a filter version. 


% FIXING2 Filter version of Examples 12.4 and 12.7. 
% Shows the impact on introducing constraints 
% as observations with zero variance 


format short 

A= [-1010000 0; 
0.707 0.707 O 0 -0.707 -0.707 0 O 

0.707 -0.707 0 0 O 0 -0.707 0.707; 

O 0 0.924 0.383 -0.924 -0.383 O 0; 
00000-1 01; 
O 0 0.924 -0.383 0 0O -0.924 0.383]; 

b = [0.01; 0.02; 0.03; 0.01; 0.02; 0.03]; 

Cov = eye(6); 

x = zeros(8,1); 

= 1.e6 +» eye(8); 


% Regular update 
fori = 1:6 
[x,P] = k_update(x,P,A(i,:),b(i), Covii,i)); 
forj = 1:8 
fprintf(’%8.4f",x(j)) 
end 
fprintf(’\Nn’) 
end 


fprintf(’\Nn’) 
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% Update with constraints, that is observations with variance zero 
G={1010101 0; 
01010101; 
—170.71 170.71 -170.71 270.71 -100 100 -241.42 100; 
170.71 170.71 270.71 170.71 100100 100 241.42)’; 
% Baarda theory 
Sp = [zeros(4,8); —G(5:8,:) * inv(G(1:4,:)) eye(4)I; 
Cov = (norm(b—A * x)) ^ 2 * pinv(A’ * A); 
Sigma_xp = Sp * Cov * Sp’; 


% Fixing the first four coordinates to zero values 
H = zeros(1,8); 


b fix = 0; 

Cov fix = 0; 

fori = 1:4 
H(i) = 1; 
[x,P] = k_update(x,P,H,b_fix,Cov_fix); 
fprintf(’\Nn’) 
forj = 1:8 

fprintt('%8.4f’,x(j)) 

end 

end 

fprintf(’\n’) 


Sigma = (norm(b—A * x)) ^ 2 * P; 
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0 Introduction 


This paper* is a revised version of the one published by W. Gurtner and G. Mader in the 
CSTG GPS Bulletin of September/October 1990. The main reason for a revision is the 
new treatment of antispoofing data by the RINEX format (see chapter 7). Chapter 4 gives 
a recommendation for data compression procedures, especially useful when large amounts 
of data are exchanged through computer networks. In Table A3 in the original paper the 
definition of the PGM / RUN BY / DATE navigation header record was missing, although the 
example showed it. The redefinition of AODE/AODC to IODE/I0DC also asks for an update 
of the format description. For consistency reasons we also defined a Version 2 format for 
the Meteorological Data files (inclusion of a END OF HEADER record and an optional MARKER 
NUMBER record).' 

In order to have all the available information about RINEX in one place we also 
included parts of earlier papers and a complete set of format definition tables and examples. 


URA Clarification (10-Dec-93) The user range accuracy in the Navigation Message file 
did not contain a definition of the units: There existed two ways of interpretation: Either 
the 4 bit value from the original message or the converted value in meters according to GPS 
ICD-200. In order to simplify the interpretation for the user of the RINEX files I propose 
the bits to be converted into meters prior to RINEX file creation. 


1 The Philosophy of RINEX 


The first proposal for the “Receiver Independent Exchange Format” RINEX has been de- 
veloped by the Astronomical Institute of the University of Berne for the easy exchange of 


*Version 2 has the following development: Revision, April 1993; Clarification December 1993; Doppler Def- 
inition: Jan. 1994; PR Clarification: Oct. 1994; Wlfact Clarification: Feb. 1995; Event Time Frame Clarification: 
May 1996; Minor errors in the examples A7/A8: May 1996. 

tThe slight modification (or rather the definition of a bit in the Loss of Lock Indicator unused so far) to flag 
AS data is so small a change that we decided to NOT increase the version number! 
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the GPS data to be collected during the large European GPS campaign EUREF 89, which 
involved more than 60 GPS receivers of 4 different manufacturers. The governing aspect 
during the development was the following fact: 


Most geodetic processing software for GPS data use a well-defined set of observables: 


- the carrier-phase measurement at one or both carriers (actually being a measure- 
ment on the beat frequency between the received carrier of the satellite signal and a 
receiver-generated reference frequency). 


- the pseudorange (code) measurement, equivalent to the difference of the time of 
reception (expressed in the time frame of the receiver) and the time of transmission 
(expressed in the time frame of the satellite) of a distinct satellite signal. 


- the observation time being the reading of the receiver clock at the instant of validity 
of the carrier-phase and/or the code measurements. 


Usually the software assumes that the observation time is valid for both the phase and the 
code measurements, and for all satellites observed. 

Consequently all these programs do not need most of the information that is usually 
stored by the receivers: They need phase, code, and time in the above mentioned defini- 
tions, and some station-related information like station name, antenna height, etc. 


2 General Format Description 


Currently the format consists of three ASCII file types: 
1 Observation Data file 

2 Navigation Message file 

3 Meteorological Data file 


Each file type consists of a header section and a data section. The header section contains 
global information for the entire file and is placed at the beginning of the file. The header 
section contains header labels in columns 61-80 for each line contained in the header 
section. These labels are mandatory and must appear exactly as given in these descriptions 
and examples. 

The format has been optimized for mimimum space requirements independent from 
the number of different observation types of a specific receiver by indicating in the header 
the types of observations to be stored. In computer systems allowing variable record 
lengths the observation records may then be kept as short as possible. The maximum 
record length is 80 bytes per record. 

Each Observation file and each Meteorological Data file basically contain the data 
from one site and one session. RINEX Version 2 also allows to include observation data 
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from more than one site subsequently occupied by a roving receiver in rapid static or kine- 
matic applications. 

If data from more than one receiver has to be exchanged it would not be economical 
to include the identical satellite messages collected by the different receivers several times. 
Therefore the Navigation Message file from one receiver may be exchanged or a com- 
posite Navigation Message file created containing non-redundant information from several 
receivers in order to make the most complete file. 

The format of the data records of the RINEX Version 1 Navigation Message file is 
identical to the former NGS exchange format. 

The actual format descriptions as well as examples are given in the Tables at the end 
of the paper. 


3 Definition of the Observables 


GPS observables include three fundamental quantities that need to be defined: Time, Phase, 
and Range. 


Time The time of the measurement is the receiver time of the received signals. It is iden- 
tical for the phase and range measurements and is identical for all satellites observed 
at that epoch. It is expressed in GPS time (not Universal Time). 


Pseudo-Range The pseudo-range (PR) is the distance from the receiver antenna to the 
satellite antenna including receiver and satellite clock offsets (and other biases, such 
as atmospheric delays): 


PR = distance + c x (receiver clock offset — satellite clock offset + other biases) 


so that the pseudo-range reflects the actual behavior of the receiver and satellite 
clocks. The pseudo-range is stored in units of meters. 


Phase The phase is the carrier-phase measured in whole cycles at both Lı and L2. The 
half-cycles measured by squaring-type receivers must be converted to whole cycles 
and flagged by the wavelength factor in the header section. 


The phase changes in the same sense as the range (negative doppler). The phase 
observations between epochs must be connected by including the integer number of 
cycles. The phase observations will not contain any systematic drifts from intentional 
offsets of the reference oscillators. 


The observables are not corrected for external effects like atmospheric refraction, satellite 
clock offsets, etc. 

If the receiver or the converter software adjusts the measurements using the real- 
time-derived receiver clock offsets dT(r), the consistency of the 3 quantities phase/pseudo- 
range/epoch must be maintained, i.e. the receiver clock correction should be applied to all 
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3 observables: 


Time(corr) = Time(r) — dT(r) 
PR(corr) = PR(r) — dT(r) * c 
phase(corr) = phase(r) — dT(r) * freq 


Doppler The sign of the doppler shift as additional observable is defined as usual: Positive 
for approaching satellites. 


4 The Exchange of RINEX Files 


We recommend using the following naming convention for RINEX files: 


ssssdddf.yyt ssss: 4-character station name designator 
ddd: day of the year of first record 

f: file sequence number within day 

0: file contains all the existing 
data of the current day 
yy: year 

t: file type: 
O : Observation file 
N : Navigation file 
M: Meteorological data file 


To exchange RINEX files on magnetic tapes we recommend using the following tape 
format: 


- Non-label; ASCII; fixed record length: 80 characters; block size: 8000 
- First file on tape contains list of files using above-mentioned naming conventions 


When data transmission times or storage volumes are critical we recommend compressing 

the files prior to storage or transmission using the UNIX “compress” and “uncompress” 

programs. Compatible routines are available on VAX/VMS and PC/DOS systems, as well. 
Proposed naming conventions for the compressed files: 


System Observation files Navigation files 
UNIX ssssdddf.yyO.Z ssssdddf.yyN.Z 
VMS ssssdddf.yyO_Z ssssdddf.yyN_Z 
DOS ssssdddf.yyY ssssdddf.yyX 


5 RINEX Version 2 Features 


The following section contains features that have been introduced for RINEX Version 2. 
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5.1 Satellite Numbers 


Version 2 has been prepared to contain GLONASS or other satellite systems’ observations. 
Therefore we have to be able to distinguish the satellites of the different systems: We 
precede the 2-digit satellite number with a system denominator. 


snn s: Satellite system 
blank : system as defined in header record 


G : GPS 
R : GLONASS 
T : Transit 


nn: PRN (GPS), almanac number (GLONASS) 
or two-digit satellite number 


Note: G, R, and T are mandatory in mixed files. 


5.2 Order of the Header Records 


As the record descriptors in columns 61—80 are mandatory, the programs reading a RINEX 
Version 2 header are able to decode the header records with formats according to the record 
descriptor, provided the records have been first read into an internal buffer. 

We therefore propose to allow free ordering of the header records, with the following 
exceptions: 


_ The RINEX VERSION / TYPE record must be the first record in a file 


- The default WAVELENGTH FACT L1/2 record (if present) should precede all records defin- 
ing wavelength factors for individual satellites 


— The # OF SATELLITES record (if present) should be immediately followed by the cor- 
responding number of PRN / # OF OBS records. (These records may be handy for 
documentary purposes.) However, since they may only be created after having read 
the whole raw data file we define them to be optional. 


5.3 Missing Items, Duration of the Validity of Values 


Items that are not known at the file creation time can be set to zero or blank or the respective 
record may be completely omitted. Consequently items of missing header records will be 
set to zero or blank by the program reading RINEX files. Each value remains valid until 
changed by an additional header record. 


5.4 Event Flag Records 


The “number of satellites” also corresponds to the number of records of the same epoch 
followed. Therefore it may be used to skip the appropriate number of records if certain 
event flags are not to be evaluated in detail. 
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5.5 Receiver Clock Offset 


A large number of users asked to optionally include a receiver-derived clock offset into the 
RINEX format. In order to prevent confusion and redundancy, the receiver clock offset (if 
present) should report the value that has been used to correct the observables according to 
the formulae under item 1. It would then be possible to reconstruct the original observa- 
tions if necessary. As the output format for the receiver-derived clock offset is limited to 
nanoseconds the offset should be rounded to the nearest nanosecond before it is used to 
correct the observables in order to guarantee correct reconstruction. 


6 Additional Hints and Tips 


Programs developed to read RINEX Version 1 files have to verify the version number. 
Version 2 files may look different (version number, END OF HEADER record, receiver and 
antenna serial number alphanumeric) even if they do not use any of the new features. 

We propose that routines to read RINEX Version 2 files automatically delete leading 
blanks in any CHARACTER input field. Routines creating RINEX Version 2 files should 
also left-justify all variables in the CHARACTER fields. 

DOS, and other, files may have variable record lengths, so we recommend to first 
read each observation record into a 80-character blank string and decode the data after- 
wards. In variable length records, empty data fields at the end of a record may be missing, 
especially in the case of the optional receiver clock offset. 


7 RINEX Under Antispoofing (AS) 


Some receivers generate code delay differences between the first and second frequency 
using cross-correlation techniques when AS is on and may recover the phase observations 
on L2 in full cycles. Using the C/A code delay on Lı and the observed difference it is 
possible to generate a code delay observation for the second frequency. 

Other receivers recover P code observations by breaking down the Y code into P and 
W code. 

Most of these observations may suffer from an increased noise level. In order to 
enable the postprocessing programs to take special actions, such AS-infected observations 
are flagged using bit number 2 of the Loss of Lock Indicators (i.e. their current values are 
increased by 4). 
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fon en en nn ee ee eo ee 5 ee ee ee ee ee + 
| TABLE A1 | 
| OBSERVATION DATA FILE - HEADER SECTION DESCRIPTION | 
+-----------~--------- $--- 3 --- =~ 2 + 5 + + + +------------ + 
| HEADER LABEL | DESCRIPTION | FORMAT | 
| (Columns 61-80) | | | 
+-------------------- +------------------------------------------ +------------ + 
|RINEX VERSION / TYPE| - Format version (2) | I16,14X, | 
| | - File type (’0O’ for Observation Data) | A1,19X, | 
| | - Satellite System: blank or ’G’: GPS | A1,19X | 
| | *R’: GLONASS | | 
| | ’T’: NNSS Transit | | 
| | "M’: Mixed | | 
+-------------------- fon none ee ee +------------ + 
|PGM / RUN BY / DATE | - Name of program creating current file | A20, | 
| | - Name of agency creating current file | A20, | 
| | - Date of file creation | A20 | 
+-------------------- $------- - ee == +------------ + 

* | COMMENT | Comment line(s) | A60 7 
+-------------------- foo nee ee  - -- +------------ + 
|MARKER NAME | Name of antenna marker | A60 | 
+-------------------- $r---------- + +--+ +--+ 5 + = === - +------------ + 

* | MARKER NUMBER | Number of antenna marker | A20 | * 
+-------------------- pon---- +--+ +------------ + 
| OBSERVER / AGENCY | Name of observer / agency | A20,A40 | 
+-------------------- oo +------------ + 
|REC # / TYPE / VERS | Receiver number, type, and version | 3A20 | 
| | (Version: e.g. Internal Software Version) | | 
+-------------------- +---------------- ~~~ ~~ +--+ +------------ + 
[ANT # / TYPE | Antenna number and type | 2A20 | 
+-------------------- poo nea ee ee +------------ + 
| APPROX POSITION XYZ | Approximate marker position (WGS84) | 3F14.4 | 
+-------------------- $o--------------- ~~~ +--+ - = - +--+ +--+ - +------------ + 
| ANTENNA: DELTA H/E/N| - Antenna height: Height of bottom | 3F14.4 | 
| | surface of antenna above marker | | 
| | - Eccentricities of antenna center | | 
| | relative to marker to the east | | 
| | and north (all units in meters) | | 
+-------------------- $-------------------- ~~~ - - ~~ -- = - = ----- +------------ + 
[WAVELENGTH FACT L1/2| - Wavelength factors for L1 and L2 | 216, | 

| | 


1: Full cycle ambiguities 
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* | INTERVAL 

+ Ne eee 
[TIME OF FIRST OBS 

| 

tee ee en re ne en ee 
*|TIME OF LAST OBS 

| 

+ www ww eee eee ee ee m Á a a 
*|# OF SATELLITES 

| 

C monkonko en anena a a a aa 
*|PRN / # OF OBS 


| END OF HEADER 
+ - æ m l l l M M M m a M l l M m ‘M 


2: 


Half cycle ambiguities (squaring) 


0O Cin L2): Single frequency instrument 
- Number of satellites to follow in list 
0 or blank: Default wavelength factors 
Maximum 7. If more than 7 satellites: 
Repeat record. 
- List of PRNs (satellite numbers) 


- Number of different observation types 


- Observation types 


The following observation types are 
defined in RINEX Version 2: 


L1, L2: 


C1 


| 
| 
| 
| 
| 
| 
| 
+ 
| 
stored in the file | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 


Phase measurements on L1 and L2 


: Pseudorange using C/A-Code on L1 
P1, P2: 
D1, D2: 
Fly- t2: 


Pseudorange using P-Code on L1,L2 
Doppler frequency on L1 and L2 
Transit Integrated Doppler on 

150 (T1) and 400 MHz (T2) 


Observations collected under Antispoofing| 
are converted to "L2" or "P2" and flagged| 
with bit 2 of loss of lock indicator | 
(see Table A2). 


Units : 


Phase : full cycles 
Pseudorange : meters 
Doppler : Hz 


Transit > cycles 


The sequence of the types in this record 
has to correspond to the sequence of the 


| 
| 
| 
| 
| 
| 
| 
| 
| 
observations in the observation records | 
+ 
| 
+ 
| 


Time of first observation record 
year (4 digits), month,day,hour,min,sec| 
~ 22-25 ee ee +------------+ 
Time of last observation record | 
year (4 digits), month,day,hour,min,sec| 


ee æ we = we a ww ee we a ee ee ee ee ee m M + 


Number of satellites, for which | 
observations are stored in the file | 


jm wee m om o m ome m m m ee ee ee ee we a a ee a i we we ew a + 


PRN (sat.number), number of observations | 
for each observation type indicated | 
in the "# / TYPES OF OBSERV" - record. 
This record is repeated for each 
satellite present in the data file 


| 
| 
| 
re ee we we we wwe ww wwe we we a we oe we wes we a we we we ae we ee we a a + 
| 
+ 


Records marked with * are optional 


| 
| 
I6, | 

| 

| 

| 
7(3X,A1,12) | 
Seneca nee + 


I6, 


9(4X,A2) 


5I6,F12.6 | 


516,F12.6 |* 


wee ee we ee ee ee + 
I6 | * 

| 

cst as aos tee ie + 
3X,A1,12,916|* 
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| TABLE A2 | 
| OBSERVATION DATA FILE - DATA RECORD DESCRIPTION | 


EPOCH/SAT 
or 
EVENT FLAG 


- Epoch : 
year (2 digits), month,day,hour,min,sec 
- Epoch flag 0: OK 
1: power failure between 
previous and current epoch 
>1: Event flag 
- Number of satellites in current epoch 
- List of PRNs (sat.numbers) in current epoch 
If more than 12 satellites: Continued in 
next line with n(Al1,I2) 
- receiver clock offset (seconds, optional) 


T3, 


I3; 


+ + 
| | 
+ + 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| If EVENT FLAG record (epoch flag > 1): | 
| - Event flag: | 
| 2: start moving antenna | 
| 3: new site occupation (end of kinem. data) | 
| (at least MARKER NAME record follows) | 
| 4: header information fol lows | 
| 5: external event (epoch is significant, | 
| same time frame as observation time tags) | 
| 6: cycle slip records follow to optionally | 
| report detected and repaired cycle slips | 
| (same format as OBSERVATIONS records; | 
| slip instead of observation; LLI and | 
| signal strength blank) | 
| - "Number of satellites" contains number of | 
| records to follow (0 for event flags 2,5) | 
+ + 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 


m(F14.3, 
T1, 
I1) 


+- 
| OBSERVATIONS 
| 


- Observation | rep. within record for 
- LLI | each obs.type (same seq 
- Signal strength | as given in header) 
This record is repeated for each satellite 
given in EPOCH/SAT - record. 

If more than 5 observation types (=80 char): 
Continue observations in next record. 


Observations: 
Phase : Units in whole cycles of carrier 
Code : Units in meters 

Missing observations are written as 0.0 

or blanks. 


Loss of lock indicator (LLI). Range: 0-7 

0 or blank: OK or not known 

Bit 0 set : lost lock between previous and 
current observation: cycle slip possible 

Bit 1 set : Inverse wavelength factor to 
default (does NOT change default) 

Bit 2 set : observation under Antispoofing 

(may suffer from increased noise) 
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Signal 


| 
| 
| 
| 
| 5: th 
| 
| 
+ 


Bits 0 and 1 for phase only. 


strength projected into interval 1-9: 


1: minimum possible signal strength 


+ — — — — — — — 


reshold for good S/N ratio 
9: maximum possible signal strength 
0 or blank: not known, don’t care 

fone ee ep ee ee ee ee ee pn nn eee + 
fon ee ee ee ee ee ee ee nn nn eee + 
| TABLE A3 | 

| NAVIGATION MESSAGE FILE - HEADER SECTION DESCRIPTION | 
+-------------------- foo 5 ee +------------ + 
| HEADER LABEL | DESCRIPTION | FORMAT | 

| (Columns 61-80) | | | 
$-------------------- $o------- - +--+ +--+ + +--+ +--+ = +------------ + 
|RINEX VERSION / TYPE| - Format version (2) | I16,14X, | 

| | - File type C(’N’ for Navigation data) | A1,19X | 
+-------------------- r EE + +------------ + 
| PGM / RUN BY / DATE | - Name of program creating current file | A20, | 

l | - Name of agency creating current file | A20, | 

| | - Date of file creation | A20 | 
+-------------------- $-------- =~ =~ + e+ ee ee ee +------------ + 
* | COMMENT | Comment line(s) | A60 | * 
+-------------------- po oe MMMM +------------ + 
*| ION ALPHA | Ionosphere parameters AO-A3 of almanac | 2X,4D12.4 |* 
| | (page 18 of subframe 4) | | 
+-------------------- foo ee ee ee  -  - -- -- +------------ + 
*| ION BETA | Ionosphere parameters BO-B3 of almanac | 2xX,4D12.4 |* 
+-------------------- fo oe oe ee ee ee +------------ + 
*|DELTA-UTC: AO,A1,T,W| Almanac parameters to compute time in UTC| 3X,2D19.12,|* 
| | Cpage 18 of subframe 4) | 219 | 

| | AO,A1: terms of polynomial | | 

| | T : reference time for UTC data | | 

| | W : UTC reference week number | | 
+-------------------- fo----- + ++ +------------ + 
* | LEAP SECONDS | Delta time due to leap seconds | I6 |* 
+-------------------- fon oe ee ee + +------------ + 
| END OF HEADER | Last record in the header section. | 60X | 
+-------------------- fo nnn $e ee +--+ +--+ + +------------ + 


Records marked with * are optional 
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F5.1, 
3D19.12 


+ 
3X,4D19.12 | 
| 
| 
| 
| 


3X,4D19.12 


fo oo nnn ee ee ee ne nn nn nn nn nn eee +++ + 
| TABLE A4 

| NAVIGATION MESSAGE FILE - DATA RECORD DESCRIPTION 
Sars ee ae E $teo ease eee eee eee ee eee 
| OBS. RECORD | DESCRIPTION 

prema Seams ee oer wie TEEN EAE E ee ea 
| PRN / EPOCH / SV CLK| - Satellite PRN number 

| | - Epoch: Toc - Time of Clock 

| | year (2 digits) 

| | month 

| | day 

| | hour 

| | minute 

| | second 

| | - SV clock bias (seconds) 

| | - SV clock drift (sec/sec) 

| | - SV clock drift rate (sec/sec2) 

tosses E ES T EEA E E E ee ee ee ee 
| BROADCAST ORBIT - 1| - IODE Issue of Data, Ephemeris 

| | - Crs (meters) 

| | - Delta n Cradians/sec) 

| | - MO (radians) 

S E E E ewneneeS pesessannan erae a ARA 
| BROADCAST ORBIT - 2| - Cuc (radians) 

| | - e Eccentricity 

| | - Cus (radians) 

| | - sqrt(A) (sqrt(m)) 

pone n ene ------- + Peee ee lebeoeta see 5+s Sse eee case ee ee rece cee 
| BROADCAST ORBIT - 3| - Toe Time of Ephemeris 

| | (sec of GPS week) 

| | - Cic (radians) 

| | - OMEGA (radians) 

| | - Cis (radians) 
+-----=-----=---------- $o------------ +--+ +--+ +--+ e+ 5 5 == 
| BROADCAST ORBIT - 4| - i0 (radians) 

| | - Cre (meters) 

| | - omega (radians) 

| | - OMEGA DOT (radians/sec) 

s n anna a eee Geos oo eee nee aa Seat theese eee eet ase eees 
| BROADCAST ORBIT - 5| - IDOT (radians/sec) 

| | - Codes on L2 channel 

| | - GPS Week # (to go with TOE) 

| | - L2 P data flag 

po 22 --- == +--+ Pete sense obese ese e le eee eee see ce 
| BROADCAST ORBIT - 6| - SV accuracy (meters) 

| | - SV health (MSB only) 

| | - TGD (seconds) 

| | - IODC Issue of Data, Clock 

Sue See eater a fe rr a E alr aCe ee ele ear ee ave ee eee 
| BROADCAST ORBIT - 7| - Transmission time of message 

| | (sec of GPS week, derived e.g. 

| | from Z-count in Hand Over Word CHOW) 

| | - spare 

| | - spare 

| | - spare 

$e Sessa te eee E E Ores es eee a re ee eee ee 
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po ne nn nn ne nn ee ee ee ee en nee eee + 
| TABLE A5 | 
| METEOROLOCICAL DATA FILE - HEADER SECTION DESCRIPTION | 
+-------------------- fo nooo ---  - - - -  -  - -- - - -- - - - - ------ +------------ + 
| HEADER LABEL | DESCRIPTION | FORMAT | 
| (Columns 61-80) | | | 
+-------------------- foo 2-2 5 e+ $------------ + 
|RINEX VERSION / TYPE| - Format version (2) | I16,14X, | 
| | - File type (’M’ for Meteorological -Data) | Al, 39X | 
+~-------------------- fo woo ee 5 ee +------------ + 
| PGM / RUN BY / DATE | - Name of program creating current file | A20, | 
| | - Name of agency creating current file | A20, | 
| | - Date of file creation | A20 | 
+-------------------- rO +------------ + 
* | COMMENT | Comment line(s) | A60 |* 
$-------------------- fon -- == 3-5 + ee + = == $------------ + 
|MARKER NAME | Station Name | A60 | 
| | (preferably identical to MARKER NAME in | | 
| | the associated Observation File) | | 
fo en en ee -MMM $o------ = =~ = 5 e+ ee ++ +------------ + 
* |MARKER NUMBER | Station Number | A20 | * 
| | (preferably identical to MARKER NUMBER in| | 
| | the associated Observation File) | | 
$----------- +--+ - foo ee 5 ee ee + $o----------- + 
|# / TYPES OF OBSERV | - Number of different observation types | I6, | 
| | stored in the file | | 
| | - Observation types | 9(4X,A2) | 
| | | | 
| | The following meteorological observation | | 
| | types are defined in RINEX Version 2: | | 
| | | | 
| | PR : Pressure (mbar) | | 
| | TD : Dry temperature (deg Celsius) | | 
| | HR : Relative Humidity (percent) | | 
| | ZW : Wet zenith path delay (millimeters) | | 
| | (for WVR data) | | 
| | | | 
| | The sequence of the types in this record | | 
| | must correspond to the sequence of the | | 
| | measurements in the data records | | 
+-------------------- +------------------------------------------ +------------ + 
| END OF HEADER | Last record in the header section. | 60X | 
+-------------------- $---------- -- -- ee ee ++ +------------ + 
+-------- ee ee 5 ee ee ee ee ee ee ee ee eee ee + 
| TABLE A6 | 
| METEOROLOGICAL DATA FILE - DATA RECORD DESCRIPTION | 
+------------- fone = 5 +--+ +------------ + 
| OBS. RECORD | DESCRIPTION | FORMAT | 
$------------- foo 2 ne - = +------------ + 
| EPOCH / MET | - Epoch in GPS time (not local time!) | 613, | 
| | year (2 digits), month,day,hour,min,sec | | 
| | | | 
| | - Met data in the same sequence as given in the | mF7.1 | 
| | header | | 
+------------- fone 5-5 = 55 = +--+ - +------------ + 
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+---~--=-2- nnn MMMM MMMM MM + 
| TABLE A7 | 
| OBSERVATION DATA FILE - EXAMPLE | 
+--------- MMMM en enn re enn ee een + 
amen aas Ome ps=2 QoS aaa i Ses | ss-4| Ooas [5 (Oana [6] 0e=5 aa (O44=|-==8) 

2 OBSERVATION DATA M (MIXED) RINEX VERSION / TYPE 


BLANK OR G = GPS, R = GLONASS, T = TRANSIT, M = 
XXRINEXO V9.9 AIUB 
EXAMPLE OF A MIXED RINEX FILE 
A 9080 
9080.1. 34 
BILL SMITH ABC INSTITUTE 
X1234A123 XX ZZZ 
234 YY 
4375274. 587466. 4589095. 
.9030 . 0000 . 0000 
1 1 
1 2 6 G14 G15 G16 G17 G18 
4 P1 L1 L2 P2 
18 
1990 3 24 13 10 36.000000 
90 3 24 13 10 36.0000000 0 3G12G 9G 6 
23629347.915 .300 8 -.353 
20891534. 648 -.120 9 -.358 
20607600.189 -.430 9 . 394 
90 3 24 13 10 50.0000000 4 3 
1 2 2 G9 G12 


MIXED 


22-APR-93 12:43 


G19 


23629364. 


20891541. 
20607605. 


*** WAVELENGTH FACTOR CHANGED FOR 2 SATELLITES *** 


90 3 24 13 10 54.0000000 


23619095 . 450 -53875.06 
20886075 .667 -28688.0 
20611072 .689 18247.7 
21345678.576 12345.5 
22123456. 789 23456.7 


90 3 24 13 11 0.0000000 


O 5G12G 9G 6R21R22 


32 8 
27 9 
89 9 
67 5 
89 5 
2 

4 1 


-41981. 375 
-22354.535 
14219.770 


**%* FROM NOW ON KINEMATIC DATA! *** 
90 3 24 13 11 48.0000000 0 4G16G12G 9G 6 


21110991. 756 16119 .9 
23588424. 398 -215050.5 
20869878.790 -113803.1 
20621643. 727 73797.4 
A 9080 
9080.1. 34 
.9030 . 0000 


--> THIS IS THE START OF A NEW SITE <-- 


80 7 
57 6 
87 8 
62 7 
3 4 


12560. 510 
-167571.734 
-88677 .926 
57505.177 


. 0000 


90 3 24 13 12 6.0000000 O 4G16G12G 6G 9 


21112589.384 24515.8 
23578228. 338 -268624.2 
20625218.088 92581.2 
20864539. 693 -141858.8 


90 3 24 13 13 1.2345678 


77 6 
34 7 
07 7 
36 8 
5 0 


19102.763 3 
-209317.284 4 
72141.846 4 
-110539.435 5 


23619112. 
20886082. 
20611078. 


21110998. 


23588439. 
20869884. 
20621649 


21112596. 
23578244. 
20625223. 
20864545. 


COMMENT 

PGM / RUN BY / DATE 

COMMENT 

MARKER NAME 

MARKER NUMBER 

OBSERVER / AGENCY 

REC # / TYPE / VERS 

ANT # / TYPE 

APPROX POSITION XYZ 

ANTENNA: DELTA H/E/N 

WAVELENGTH FACT L1/2 

WAVELENGTH FACT L1/2 

# / TYPES OF OBSERV 

INTERVAL 

TIME OF FIRST OBS 

END OF HEADER 
-.123456789 


158 
292 
848 


WAVELENGTH FACT L1/2 
COMMENT 
COMMENT 

-.123456789 


008 
101 
410 


COMMENT 
-.123456789 


441 


570 
938 


.2/6 


MARKER NAME 
MARKER NUMBER 
ANTENNA: DELTA H/E/N 
COMMENT 

-.123456987 
187 
398 
795 
943 


598 


4 1 
CAN EVENT FLAG WITH SIGNIFICANT EPOCH) 
90 3 24 13 14 12.0000000 0 4G16G12G 9G 6 
21124965.133 89551. 30216 69779.62654 
23507272.372 -212616.150 7 -165674.789 5 
20828010. 354 -333820.093 6 -260119.395 5 
20650944.902 227775.130 7 177487.651 4 
4 1 
*** ANTISPOOFING ON G 16 AND LOST LOCK 
90 3 24 13 14 12.0000000 6 2G16G 9 
123456789.0 -9876543.5 
0.0 -0.5 
4 2 
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21124972. 
23507288. 


20828017. 


20650950. 


COMMENT 


2754 
421 
129 
363 


-~ COMMENT 


---> CYCLE SLIPS THAT HAVE BEEN APPLIED TO 


THE OBSERVATIONS 
90 3 24 13 14 48.0000000 0 4G16G12G 9G 6 
21128884. 159 110143 .144 7 85825.18545 
23487131.045 -318463.297 7 -248152.728 4 
20817844.743 -387242.571 6 -301747.22925 
20658519. 895 267583.67817 208507. 26234 
4 3 


21128890. 
23487146. 
20817851. 
20658525. 


#%** SATELLITE G 9 THIS EPOCH ON WLFACT 1 (L2) 


*** G 6 LOST LOCK AND ON WLFACT 2 (L2) 
CINVERSE TO PREVIOUS SETTINGS) 


COMMENT 
COMMENT 


7764 
149 
322 
869 


COMMENT 
COMMENT 
COMMENT 


-.123456012 


-.123456234 


~-~~|---1]0--- |---2|0--- | ---3|0---|---4|0--- | ---5]0--- | ---6|0---|---7]0---|---8] 
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$--------------------------------------------------- += == -- 5 -------------- + 
| TABLE A8 | 
| NAVIGATION MESSAGE FILE - EXAMPLE | 
$------------------- 5-5-5 5 5 5 ee ee MOMOMO + 
ease a a a (0-5) -553 0-85 | 354 [0-2 a a a 0ne=|-27 0-25 |==28 | 


2 N: GPS NAV DATA RINEX VERSION / TYPE 
XXRINEXN V2.0 AIUB 12-SEP-90 15:22 PGM / RUN BY / DATE 
EXAMPLE OF VERSION 2 FORMAT COMMENT 
.1676D-07 .2235D-07 -.1192D-06 -.1192D-06 ION ALPHA 
.1208D+06 .1310D+06 -.1310D+06 -.1966D+06 ION BETA 
.133179128170D-06 .107469588780D-12 552960 39 DELTA-UTC: AO,A1,T,W 
6 LEAP SECONDS 
END OF HEADER 
6 90 8 2 17 51 44.0 -.839701388031D-03 -.165982783074D-10 .000000000000D+00 
.910000000000D+02 .934062500000D+02 .116040547840D-08 .162092304801D+00 
.484101474285D-05 .626740418375D-02 .652112066746D-05 .515365489006D+04 
.409904000000D+06 -.242143869400D-07 .329237003460D+00 -.596046447754D-07 
.111541663136D+01 .326593750000D+03 .206958726335D+01 -.638312302555D-08 
.307155651409D-09 .000000000000D+00 .551000000000D+03 .000000000000D+00 
.000000000000D+00 .000000000000D+00 .000000000000D+00 .910000000000D+02 
.406800000000D+06 
13 90 8 219 0 0.0 .490025617182D-03 .204636307899D-11 .000000000000D+00 
.133000000000D+03 -.963125000000D+02 .146970407622D-08 .292961152146D+01 
-.498816370964D-05 .200239347760D-02 .928156077862D-05 .515328476143D+04 
.414000000000D+06 -.279396772385D-07 .243031939942D+01 -.558793544769D-07 
.110192796930D+01 .271187500000D+03 -.232757915425D+01 -.619632953057D-08 
-.785747015231D-11 .000000000000D+00 .551000000000D+03 .000000000000D+00 
.000000000000D+00 .000000000000D+00 .000000000000D+00 .389000000000D+03 
.410400000000D+06 
eager 1 | Ores ieee aan a2 0-2 a a 925) 0-52-26 a aa a a 
Cs siasioeteeientntstetantententeateststantententestantontotestententententetestentententestertestententenestententertenentententetentetententententedtetetetetententetenetetetetetetetetatetetntetetatatates + 
| TABLE A9 | 
| METEOROLOGICAL DATA FILE - EXAMPLE | 
tan nn en nn nn re nn ee ee ee een + 
sees) aaa l a a 0-4 -=3105--[-H=4 [022 2|=290--4 420 |0-e=[a-<7 [0-==| 248 
2 METEOROLOGICAL DATA RINEX VERSION / TYPE 


XXRINEXM V9.9 AIUB 
EXAMPLE OF A MET DATA FILE 


22-APR-93 12:43 PGM / RUN BY / DATE 


COMMENT 


A 9080 MARKER NAME 
3 PR TD HR # / TYPES OF OBSERV 
END OF HEADER 
90 3 2413 10 15 987.1 10.6 89.5 
90 3 24 13 10 30 987.2 10.9 90.0 
90 3 24 13 10 45 987.1 11.6 89.0 
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Almanac Approximate location information on all satellites. 


Ambiguity The phase measurement when a receiver first locks onto a GPS signal is am- 
biguous by an integer number of cycles (because the receiver has no way of counting 
the cycles between satellite and receiver). This ambiguity remains constant until loss 
of lock. 


Analog A clock with moving hands is an analog device; a clock with displayed numbers 
is digital. Over telephone lines, digital signals must be converted to analog using a 
modem (a modulator/demodulator). 


Antispoofing (AS) Encrypting the P code by addition (modulo 2) of a secret W code. The 
resulting Y code prevents a receiver from being “spoofed” by a false P code signal. 


Atomic clock A highly precise clock based on the behavior of elements such as cesium, 
hydrogen, and rubidium. 


Azimuth Angle at your position between the meridian and the direction to a target. 
Bandwidth The range of frequencies in a signal. 
Bearing The angle at your position between grid north and the direction to a target. 


Binary Biphase Modulation The phase of a GPS carrier signal is shifted by 180° when 
there is a transition from 0 to 1 or 1 to 0. 


Block I, II, IIR, HF satellites The generations of GPS satellites: Block I were prototypes 
first launched in 1978; 24 Block II satellites made up the fully operational GPS 
constellation declared in 1995; Block IIR are replacement satellites. Block IIF refers 
to the follow-on generation. 


C/A code The coarse/acquisition or clear/acquisition code modulated onto the GPS Lı 
signal. This Gold code is a sequence of 1023 pseudorandom binary biphase mod- 
ulations on the GPS carrier at a chipping rate of 1.023 MHz, thus repeating in one 
millisecond. This “civilian code” was selected for good acquisition properties. 


Carrier A radio wave whose frequency, amplitude, or phase may be varied by modulation. 


Carrier-aided tracking A strategy that uses the GPS carrier signal to achieve an exact 
lock on the pseudorandom code. 


Carrier frequency The frequency of the unmodulated output of a radio transmitter. The 
Lı carrier frequency is 1575.42 MHz. 
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Carrier phase The accumulated phase of the Lı or L2 carrier, measured since locking 
onto the signal. Also called integrated Doppler. 


Channel The circuitry in a receiver to receive the signal from a single satellite. 


Chip The length of time to transmit 0 or 1 in a binary pulse code. The chip rate is the 
number of chips per second. 


Circular error probable (CEP) In a circular normal distribution, the radius of the circle 
containing 50% of the measurements. 


Clock bias The difference between the clock’s indicated time and GPS time. 
Clock offset Constant difference in the time reading between two clocks. 


Code Division Multiple Access (CDMA) A method whereby many senders use the same 
frequency but each has a unique code. GPS uses CDMA with Gold’s codes for their 
low cross-correlation properties. 


Code phase GPS Measurements based on the coarse C/A code. 


Code-tracking loop A receiver module that aligns a PRN code sequence in a signal with 
an identical PRN sequence generated in the receiver. 


Control segment A world-wide network of GPS stations that ensure the accuracy of 
satellite positions and their clocks. 


Cycle slip A discontinuity in the measured carrier beat phase resulting from a loss-of-lock 
in the tracking loop of a GPS receiver. 


Data message A message in the GPS signal which reports the satellite’s location, clock 
corrections and health (and rough information about the other satellites). 


Differential positioning (DGPS) A technique to improve accuracy by determining the 
positioning error at a known location and correcting the calculations of another re- 
ceiver in the same area. 


Dilution of Precision (DOP) The purely geometrical contribution to the uncertainty in 
a position fix. PDOP multiplies rms range error to give rms position error. Stan- 
dard terms for GPS are GDOP: Geometric (3 coordinates plus clock offset), PDOP: 
Position (3 coordinates), HDOP: Horizontal (2 coordinates), VDOP: Vertical (height 
only), TDOP: Time (clock offset only). PDOP is inversely proportional to the volume 
of the pyramid from the receiver to four observed satellites. A value near PDOP = 3 
is associated with widely separated satellites and good positioning. 


Dithering The introduction of digital noise. This is the process that adds inaccuracy to 
GPS signals to induce Selective Availability. 


Doppler-aiding Using a measured Doppler shift to help the receiver track the GPS signal. 
Allows more precise velocity and position measurement. 


Doppler shift The apparent change in frequency caused by the motion of the transmitter 
relative to the receiver. 
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Double difference A GPS observable formed by differencing carrier phases (or pseudo- 
ranges) measured by a pair of receivers i, j tracking the same pair of satellites k, L. 
The double difference (y* — yf) — (l - 7A) removes essentially all of the clock 
errors. 


Earth-Centered Earth-Fixed (ECEF) Cartesian coordinates that rotate with the Earth. 
Normally the X direction is the intersection of the prime meridian (Greenwich) with 
the equator. The Z axis is parallel to the spin axis of the Earth. 


Elevation Height above mean sea level. Vertical distance above the geoid. 


Elevation mask angle Satellites below this angle (often 15°) are not tracked, to avoid 
interference by buildings, trees and multipath errors, and large atmospheric delays. 


Ellipsoid A surface whose plane sections are ellipses. Geodesy generally works with an 
ellipsoid of revolution (two of the three principal axes are equal): it is formed by 
revolving an ellipse around one of its axes. 


Ellipsoidal height The vertical distance above the ellipsoid (not the same as elevation 
above sea level). 


Ephemeris Accurate position (the orbit) as a function of time. Each GPS satellite trans- 
mits a predicted ephemeris for its own orbit valid for the current hour. The ephemeris 
(repeated every 30 seconds) is a set of 16 Keplerian-like parameters with correc- 
tions for the Earth’s gravitational field and other forces. Available as “broadcast 
ephemeris” or as post-processed “precise ephemeris.” 


Epoch Measurement time or measurement interval or data frequency. 


Fast-switching channel A single channel that rapidly samples a number of satellite sig- 
nals. The switching time is sufficiently fast (2 to 5 milliseconds) to recover the data 
message. 


Frequency spectrum The distribution of signal amplitudes as a function of frequency. 


Geodesy The disciplin of point position determination, the Earth’s gravity field and tem- 
poral variations of both. 


Geodetic datum An ellipsoid of revolution designed to approximate part or all of the 
geoid. A point on the topographic surface is established as the origin of datum. The 
datum has five parameters: two for the axis lengths of the ellipse and three to give 
position. If the ellipsoid is not aligned with the coordinate axes we must add three 
rotations. 


Geoid The undulating, but smooth, equipotential surface of the Earth’s gravity field. The 
geoid is the primary reference surface, everywhere perpendicular to the force of grav- 
ity (plumb line). 


Gigahertz (GHz) One billion cycles per second = 1000 Mhz. 


GLONASS Russia’s Global Navigation Satellite System (Globalnaya Navigatsionnaya 
Sputnikovaya Sistema). 
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Global Navigation Satellite System (GNSS) A European system that would incorporate 
GPS, GLONASS, and other segments to support navigation. 


Global Positioning System (GPS) A constellation of 24 satellites orbiting the Earth at 
high altitude. GPS satellites transmit signals that allow one to determine, with great 
accuracy, the locations of GPS receivers. 


GPS ICD-200 The GPS Interface Control Document contains the full technical descrip- 
tion of the interface between satellites and user. 


GPS Time The time scale to which GPS signals are referenced, steered to keep within 
about 1 microsecond of UTC, ignoring the UTC leap seconds. GPS Time equalled 
UTC in 1980, but 10 leap seconds have been inserted into UTC. 


GPS Week The number of elapsed weeks (modulo 1024) since the week beginning Jan- 
uary 6, 1980. The week number increments at Saturday/Sunday midnight in GPS 
Time. 


Handover word (HOW) The second word in each subframe of the navigation message. 
It contains the Z-count at the leading edge of the next subframe and is used by a GPS 
receiver to determine where in its generated P code to start the correlation search. 


Hertz (Hz) One cycle per second. 


lonosphere The band of charged particles 80 to 120 miles (or often wider) above the 
Earth. 


lonospheric delay A wave propagating through the ionosphere experiences delay by re- 
fraction. Phase delay depends on electron content and affects carrier signals. Group 
delay depends on dispersion and affects signal modulation (codes). The phase and 
group delay are of the same magnitude but opposite sign. 


Kalman filter A recursive numerical method to track a time-varying signal and the asso- 
ciated covariance matrix in the presence of noise. The filter combines observation 
equations and state equations. The “Bayes filter’ does the calculation of state esti- 
mate and covariance estimate in a different order. 


Keplerian elements Six parameters that describe position and velocity in a purely ellipti- 
cal (Keplerian) orbit: the semimajor axis and eccentricity, the inclination of the orbit 
plane to the celestial equator, the right ascension of the ascending node, the argument 
of perigee, and the time the satellite passes through perigee. 


L-band The radio frequencies from 390 to 1550 MHz, or sometimes 1 to 2 GHz, including 
the GPS carrier frequencies. 


Lı signal The primary GPS signal at 1572.42 MHz. The L broadcast is modulated with 
the C/A and P codes and the navigation message. 


Lz signal The second L-band signal is centered at 1227.60 MHz and carries the P code 
and navigation message. 
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Microstrip antenna (patch antenna) A type of antenna in many GPS receivers, con- 
structed of (typically rectangular) elements that are photoetched on one side of 
double-coated, printed-circuit board. 


Modem A modulator/demodulator that converts digital to analog (for telephone transmis- 
sion) and back to digital. 


Monitor stations Data collection points linked to a master control station, where correc- 
tions are calculated and uploaded to the satellites. 


Multichannel receiver A receiver with multiple channels, each tracking one satellite con- 
tinuously. 


Multipath Interference caused by reflected GPS signals arriving at the receiver. Signals 
reflected from nearby structures travel longer paths than line-of-sight and produce 
higher positioning errors. 


Multiplexing The technique of rapidly sequencing the signals from two or more satellites 
through one tracking channel. This ensures that navigation messages are acquired 
simultaneously. 


NAD 83 North American Datum 1983. 
Nanosecond One billionth of a second. 


Narrow lane The sum of carrier-phase observations on the L; and L2 frequencies. The 
effective wavelength is 10.7 centimeters. 


Nav message The 1500-bit navigation message broadcast by each GPS satellite at 50 bps. 
This message contains satellite position, system time, clock correction, and iono- 
spheric delay parameters. 


Orthometric height The height of a point above the geoid. 


P code The precise code of the GPS signal, typically used by military receivers. A very 
long sequence of pseudo-random binary biphase modulations on the GPS carrier at a 
chip rate of 10.23 MHz which repeats about every 267 days. Each segment is unique 
to one GPS satellite and is reset each Saturday/Sunday midnight. 


Position Dilution Of Precision (PDOP) (see DOP) The factor that multiplies errors in 
ranges to give approximate errors in position. 


Phase lock loop (Carrier tracking loop) The receiver compares the phases of an oscil- 
lator signal and reference signal. The reference frequency is adjusted to eliminate 
phase difference and achieve locking. 


Point positioning A geographical position produced from one receiver. 


Precise Positioning Service (PPS) The highest level of dynamic positioning accuracy by 
single-receiver GPS, with access to the dual-frequency P code and removal of SA. 


Pseudolite (shortened form of pseudo-satellite) A ground-based receiver that simu- 
lates the GPS signal. The data may also contain differential corrections so other 
receivers can correct for GPS errors. 
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Pseudorandom Noise (PRN) A sequence of 1’s and 0’s that appears to be random but 
can be reproduced exactly. Their most important property is a low correlation with 
their own delays. Each GPS satellite has unique C/A and P pseudorandom noise 
codes. 


Pseudorange A distance measurement from satellite to receiver (by the delay lock loop) 
that has not been corrected for clock differences. 


Range rate The rate of change of distance between satellite and receiver. Range rate is 
determined from the Doppler shift of the satellite carrier. 


Relative positioning DGPS measures relative position of two receivers by observing the 
same set of satellites at the same times. 


Reliability The probability of performing a specified function without failure under given 
conditions for a specified period of time. 


Receiver INdependent EXchange format (RINEX) A set of standard definitions and 
formats for time, phase, and range that permits interchangeable use of GPS data: 
pseudorange, carrier phase, and Doppler. The format also includes meteorological 
data and site information. 


Root mean square (rms) The square root of the sum of squares. This quantity || Ax — b || 
is minimized by the least-squares solution xX. 


Satellite constellation The GPS constellation has six orbital planes, each containing four 
satellites. GLONASS has three orbital planes containing eight satellites. 


Selective Availability (SA) A program that limits the accuracy of GPS pseudorange mea- 
surements, degrading the signal by dithering the time and position in the navigation 
message. The error is guaranteed to be below 100 meters, 95% of the time. It will 
soon be discontinued. 


Spherical Error Probable (SEP) The radius of a sphere within which there is a 50% 
probability of locating the true coordinates of a point. 


Spread spectrum A signal normally requiring a narrow transmission bandwidth but 
spread over a much larger bandwidth. The 50 bits-per-second GPS navigation mes- 
sage could use a bandwidth of 50 Hz, but it is spread by modulating with the pseu- 
dorandom C/A code. Then all satellites can be received unambiguously. 


Squaring channel A GPS receiver channel that multiplies the received signal by itself to 
remove the code modulation (since (—1)* = 1). 


Standard deviation (g) A measure of the dispersion of random errors about their mean. 
Experimentally, the standard deviation is the square root of the sum of the squares of 
deviations from the mean divided by the number of observations less one. 


‘Standard Positioning Service (SPS) The normal civilian positioning accuracy obtained 
by using the single frequency C/A code. 
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Triple difference The difference in time of doubly differenced carrier-phase observations. 
The triple difference is free of integer ambiguities. It is useful for detecting cycle 
slips. 


Universal Time Coordinated (UTC) A highly accurate and stable atomic time system 
kept very close, by inserting leap seconds, to the universal time corrected for seasonal 
variations in the Earth’s rotation rate. Maintained by the U. S. Naval Observatory. 
(The changing constant from GPS time to UTC is 10 seconds today.) 


User Equivalent Range Error (UERE) Any positioning error expressed as an equivalent 
error in the range between receiver and satellite. The total VERE is the square root of 
the sum of squares of the individual errors, assumed to be independent. A prediction 
of maximum total UERE (minus ionospheric error) is in each satellite’s navigation 
message as the user range accuracy (URA). 


Wide Area Augmentation System (WAAS) A form of “Wide Area DGPS” with correc- 
tions from a network of reference stations. 


WGS 84 A geocentric reference ellipsoid, a coordinate system, and a gravity field model 
(in terms of harmonic coefficients). The GPS satellite orbits have been referenced to 
WGS 84 since January 1987. Redefined at the epoch 1994.0 and designated WGS 84 
(G730). 


Y code The encrypted version of the P code. 


Z-Count The GPS time unit (29 bits). GPS week number and time-of-week in units of 
1.5 seconds. A truncated TOW with 6-second epochs is included in the handover 
word. 


Zenith distance Angle at your position between the direction to zenith and the direction 
to a target. 
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abs_pos 503 
accum0 495 
anheader 494 
ash_dd 489, 495 
autocorr 529 

b point 501, 502 
b row 580 
bancroft 481, 482, 500, 502 
bdata 489 

c2g 470, 472 
c2gm 470 
check_t 487 
corrdemo 326 
dw 393 

e_r_corr 502 
edata 489 
elimnor 397 
elimobs 400 
ellaxes 342 

errell 337 
fepoch_0 494 
find_eph 488, 489, 501 
findloop 302 
findnode 302 
fixing] 581 
fixing2 582 
frgeod 368 

g2c 470 

gauss1 471 
gauss2 471 
get_eph 486, 488, 489, 501 
gmproc 520 
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gps_time 487 
grabdata 494 
julday 487 
k_dd3 506, 507 
k_dd4 507 
k_point 502, 580 
k_row 580 
k_simul 578 
k_ud 502 
kalclock 510 
kleus 505 

kud 580 

lev 286 

locate 495 
looplist 302 

lor 444 

lorentz 500, 502 
mangouli 541 
model_g 539 
normals 502, 580 
null 302 
one_way 495, 512 
oned 330 
oneway_1535 
plu 302 
proc_dd 494 

qr 371 

recclock 509 
recpos 481, 488 
ref 302 

relellip 342 
repair 322 
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rinexe 486, 494 togeod 368, 501, 502 

rts 571 topocent 501 

satpos 486, 501, 502 tropo 456, 475, 491, 493, 501 
satposin 486 tropp 501 

sdata 489 twod 311 

sets 441 v_loops 303, 493 

simil 419 vectors 303 

smoother 572 wc 581 


support 339 


—1,2,—1 matrix 69, 77, 98, 204, 242, 
378, 394 


A = LDL! 89, 241 

A = LDU 78, 83, 85 

A= LU 76, 83 

A= PLU 116 

A = OAQ! 243, 244 

A = QR 190, 191, 194 

A = SAS™! 222, 231 

A = UXV! 258, 263 

AA! 264 

ATA 171, 245 

acceleration 574 

adjustment 300 

almanac 601 

ambiguity 458, 463, 489, 495, 601 

analog 601 

angle 14 

antenna 492, 495 

antispoofing 601 

AR(1) 526 

arrow 4, 7 

ascending node 482, 485, 486 

Ashtech 482, 489 

associative law 49, 57, 62 

asymptotic 554 

atomic clock 448, 455, 601 

augmented matrix 50, 54, 69, 123 

autocorrelation 517, 519, 523, 530, 531 

autoregression 521, 526 

axes 244 

azimuth 354, 363, 469, 470, 475, 488, 
502, 601 


back substitution 37, 79, 82 
backward filter 571 
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INDEX 


bandwidth 522, 601 
barycenter 407, 415, 418, 419, 432, 443 
baseline 490 

basepoint 347 

basis 137, 139 

Bayes filter 510, 566, 567 
bearing 343, 601 

Bernese 48 1 

binary biphase modulation 601 
block elimination 397 

block inverse matrix 560 
block matrix 58 

block multiplication 58 

block tridiagonal matrix 560 
BLUE 323 

bordering 559 

breakdown 31, 38, 43 


C/A code 450, 451, 601 
canonical vector 376 

carrier frequency 601 

carrier phase 602 

carrier-aided tracking 601 
Cayley-Hamilton 232 

Central Limit Theorem 319 
change of basis 259, 260 
channel 602 

characteristic equation 215 
checkerboard matrix 156 

x? distribution 312 

chip 449, 602 

chipping rate 490 

Cholesky 86, 247, 372, 395, 401 
Cholesky factorization 336, 372-374, 554 
circular error probable 602 
clock 9, 455 

clock bias 602 
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clock offset 507—509, 524, 525, 573, 602 

clock reset 509 

closest line 175, 176, 178, 182 

code division multiple access 602 

code observation 460 

code phase 602 

code-tracking loop 602 

coefficient matrix 29 

cofactor matrix 209 

column picture 29, 31, 39 

column space 104, 148 

column vector 7 

columns times rows 59, 64 

combination of columns 31, 47, 56 

commutative law 49, 57, 62 

complete solution 112, 124, 129 

complete system 545 

complex conjugate 235, 237 

complex matrix 64 

components 4 

condition equation 300, 409 

condition number 378, 382, 386, 388 

confidence ellipse 317, 337, 338, 341, 
359, 390 

confidence interval 311, 316 

consistent 126 

constellation 452 

control network 423-428 

control points 426 

control segment 602 

coordinate measurement 354 

correction 543, 545, 551, 562 

correlate equation 301 

correlation 322, 331 

correlation time 521 

correlogram 531 

cosine 16, 20 

covariance 320 

covariance function 389 

covariance matrix 321, 324, 544 

covariance propagation 329 

Cramer’s rule 206, 207 

cross-correlation 518 

cube 9 


cumulative density 310, 315 
Current Law 285, 293 
cycle slip 459, 602 


data message 602 

datum 473, 477, 479 

decorrelation 332,401, 496, 498 

degrees of freedom 315, 334 

Delay Lock Loop 449 

dependent 134, 136 

derivation 558 

determinant 66, 197, 215, 217, 556 

determinant of inverse 202 

determinant of product 202 

determinant of transpose 202 

diagonalizable 224 

diagonalization 222 

differential GPS 330, 458 

differential positioning 602 

Dilution of Precision 458, 462, 463, 602 

dimension 114, 139, 140, 148, 150, 290 

direction measurement 434 

dispersive 454, 455 

distance 22, 27, 171, 345, 350, 366 

distance measurement 434 

distributive law 57 

dithering 447, 602 

Doppler 449, 602 

Doppler-aiding 602 

dot product 11, 15, 16, 88 

double difference 459, 463, 467, 535, 538, 
603 

dual 297, 300 


eccentric anomaly 483 
ECEF 462, 474, 482, 487, 603 
echelon matrix 113 
ecliptic 482 
eigenvalue 211, 215 
eigenvalues 
complex 237 
positive 237, 238, 242 
product of 217, 219 
real 233, 235 


repeated 223 

sum of 217, 219 
eigenvalues of A? 212 
eigenvector 211, 216 
Einstein 47, 451 
elevation 603 
elevation angle 502, 536 
elevation mask angle 603 
elimination 37, 80, 111, 394, 496 
elimination matrix 48, 51, 76 
ellipse 243, 244, 338, 467 
ellipsoid 467, 473, 603 
ellipsoidal height 603 
ensemble average 519, 520 
entry 47 
enu system 363, 474, 475, 501 
ephemerides 448, 481, 484, 488, 603 
epoch 459, 603 
equinox 482 
ergodic 519 
error ellipse 332, 337 
error vector 168 
Euler’s formula 284, 294, 305 
even permutation 198 
exponential correlation 527, 533 
extended Kalman filter 509 


F distribution 318 
factorization 76 
fast-switching channel 603 
Fibonacci 225, 229, 555 
finite element method 299 
fitting line 442, 443 

fixed solution 459 

fixed value 423 

flattening 467 

float solution 459, 496, 499 
formula for A~! 208 
FORTRAN 8, 17 

forward elimination 79, 82 


four fundamental subspaces 147, 150, 265, 


268, 290 
Fourier series 187 
fractile 315 
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Fredholm 163 

free 39, 110 

free network 411, 415, 428 

free stationing 356, 361, 431, 433, 435 
free variable 112, 114 

frequency spectrum 603 

full rank 128, 131 

function space 102, 142, 146 
Fundamental Theorem 151, 160, 163, 259 


gain matrix 543, 549, 551, 562-564 
GAMIT 481 

GAST 484 

Gauss 323, 392, 470, 563 
Gauss-Jordan 68, 69, 74, 115, 133 
Gauss-Markov process 520, 521, 526, 541 
Gauss-Markov Theorem 568 

Gaussian elimination 41, 115 

Gaussian process 517, 519 

Gelb 552 

geodesy 603 

geodetic datum 603 

geographical coordinates 362, 468, 475 
geoid 472, 476, 603 

geometric mean 17, 19 

Global Navigation Satellite System 604 
Global Positioning System 604 
GLONASS 449, 603 

Goad 367, 490, 506 

Gold code 450, 451 

GPS ICD-200 604 

GPS Time 604 

GPS Week 604 

Gram-Schmidt 188, 189, 371, 372, 557 
graph 282 

group delay 465 

GYPSY 481 


handover word 604 

hard postulation 409 
heading 574 

height 469, 472 

height differences 283 
height measurement 354 
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heights 275 

postulated 276 
Heisenberg 225, 232 
Helmert 419 
Hilbert matrix 75 
homogeneity 390 
homogeneous solution 124, 129 
horizontal direction 354, 364 
house 253, 256, 257 
hyperbola 449 
hyperboloid 503 


identity matrix 30, 48 

incidence matrix 282, 283, 287, 305 
inclination 482, 485, 486 

income 12 

increments 343, 344 

independent 134, 137 

independent columns 135, 140, 150, 161 
independent variables 321 

infinite entries 511 

information filter 560, 569 
information matrix 324, 554, 566 
inner product 11 

innovation 543, 549 

integer least squares 497 

inverse block matrix 425 

inverse matrix 65, 69 

inverse of AB 67, 73 

invertible 66, 71 

ionosphere 455, 457, 604 
ionospheric delay 456, 490, 506, 604 
isotropy 390 

iteration 248, 347 


Jacobi 249 

Jacobian matrix 330, 491 
Jordan form 250 

Joseph form 563 


Kalman 563, 567 
Kalman filter 482, 543, 552, 555, 604 
Kepler’s equation 483 


Keplerian elements 448, 481, 482, 484, 
485, 604 

kernel 252, 255 

kinematic receiver 526 

Krarup 383, 394, 409 

Kronecker product 384 


L-band 604 

Lı signal 604 

La signal 604 

Lagrange multiplier 297, 424 
Lagrangian function 298, 424 
LAMBDA 482, 495, 498 
latency 513 

Law of Cosines 20 

Law of Propagation 329, 333 
least-squares 174, 176 

left nullspace 146, 149, 151, 154 
left-inverse 66 

length 13, 186 

leveling 275, 367, 378, 405 
linear combination 6, 10, 30, 104 
linear independence 134 

linear programming 277 

linear regression 387, 442, 443 
linear transformation 251, 253 
linearization 344, 349, 357, 381, 382, 460 
LINPACK 79 

loop 291, 302 

loop law 285 

LU factorization 77 


magic matrix 36 
mapping function 456 
marginal distribution 312 
Markov matrix 35, 213, 220, 224 
MATLAB 8, 17, 81 
matrix 29, 54 
—1, 2, —1 69, 77, 98, 204, 242, 378, 
394 
augmented 50, 54, 69, 123 
block 58 
block inverse 560 
block tridiagonal 560 


checkerboard 156 
coefficient 29 
cofactor 209 
complex 64 
covariance 321, 324, 544 
echelon 113 
elimination 48, 51, 76 
gain 543, 549, 551, 562-564 
Hilbert 75 
identity 48 
incidence 282, 283, 287, 305 
information 324, 554, 566 
inverse 65, 69 
inverse block 425 
Jacobian 330, 491 
magic 36 
Markov 35, 213, 220, 224 
nullspace 117, 118 
orthogonal 185, 214, 233, 263, 375 
permutation 50, 90, 97 
positive definite 239, 240, 242, 246, 
281, 322 
projection 167, 169, 173, 213, 236 
reflection 185, 196, 213, 261 
reverse identity 203 
rotation 185, 214 
semidefinite 240, 241 
similar 259, 260 
singular 201 
skew-symmetric 98, 204, 214 
square root 247, 569 
symmetric 89, 214, 233 
triangular 201 
tridiagonal 70, 80, 85, 546 
weight 323, 466 
zero 55 
matrix multiplication 49, 55 
matrix notation 30, 32 
matrix space 103, 107, 141, 145, 256 
matrix times vector 32, 33, 47 
mean 182, 309, 319, 516 
mean anomaly 483, 485 
mean motion 483 
mean sea level 472 
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meridian 468 
microstrip antenna 605 
minimum 239 

modem 605 

moment 11 

monitor station 605 
multichannel receiver 605 
multipath 456, 457, 605 
multiplexing 605 
multiplicity 228 
multiplier 38, 76, 556 


NAD 83 605 

narrow lane 512, 605 

nav message 605 

navigation data 481 

network 284, 379, 384, 386, 389, 406 
node law 285, 303 

nondiagonalizable 223, 228 
nonlinearity 343 

nonsingular 40 

norm 13, 17, 277 

normal distribution 309, 311, 312, 337 
normal equation 169, 177, 269, 280, 435 
notation 551 

nullspace 109, 113, 117, 148 
nullspace matrix 117, 118 

nullspace of ATA 171 

numerical precision 348, 349 


observation equation 349, 355, 545 

odd permutation 198 

Ohm’s law 296 

operation count 56, 68, 80, 81 

orbit 484 

orientation unknown 355, 431, 438, 441 

orthogonal 20 

orthogonal complement 159, 160, 164 

orthogonal matrix 185, 214, 233, 263, 
375 

orthogonal subspaces 157—159 

orthogonalization 568 

orthometric height 605 

orthonormal 184 
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orthonormal basis 258, 262, 266 
orthonormal columns 184, 186, 192 
outer product 56, 59 


P code 450, 451, 605 

PA = LU 91-93 

parabola 179, 181 

parallel machine 81 

parallelogram 5, 7 

particular solution 25, 113, 124, 129 

Pascal 86 

PDOP 463 

perigee 482, 485, 486 

permutation matrix 50, 90, 97 

perpendicular 15, 18 

perpendicular eigenvectors 214, 233, 235 

phase 458 

phase lock loop 605 

phase observation 464 

phase velocity 454 

pivot 38, 39, 556 

pivot column 117, 138 

pivot variable 112 

plane 20, 21, 24 

plumb line 364, 473 

point error 437 

point positioning 605 

polar decomposition 267 

Position Dilution of Precision 605 

position error 447, 454 

positive definite matrix 239, 240, 242, 
246, 281, 322 

postulated coordinates 276, 423, 424 

power spectral density 520, 523 

powers of a matrix 211, 224, 227, 231, 
250 

Precise Positioning Service 605 

preconditioner 248 

prediction 543, 545, 551, 562, 564 

President Clinton 447 

principal axis 233, 244 

probability density 309 

product of determinants 202 

product of pivots 201 


projection 165, 166, 169, 281, 373, 393 

projection matrix 167, 169, 173, 213, 236 

projection onto a subspace 168 

propagation law 329, 333, 462, 466, 523, 
562 

pseudodistance 352 

pseudoinverse 161, 267, 270, 405, 411, 
413 

pseudolite 605 

pseudorandom 450, 606 

pseudorange 330, 448, 449, 500, 606 

Ptolemy 469 

pulse rate 546 

Pythagoras 9 


QR factorization 335 
quasi-distance 351 


random process 320, 515, 523 

random ramp 516, 524, 527 

random variable 309, 413 

random walk 517, 524, 527-529 

range 105, 252, 255 

range error 457 

range rate 606 

rank 122, 127, 150 

rank deficient 408 

rank one 132, 152, 167, 270 

ratios of distances 353 

real-time 513 

recursive least squares 183, 543, 545, 549 

reduced echelon form 69, 70, 114, 117, 
125 

reference ellipsoid 362, 471 

reflection matrix 185, 196, 213, 261 

refraction 454 

relative confidence ellipse 341, 342 

relative positioning 606 

relativity 451 

reliability 606 

repeated eigenvalues 223 

residual 248, 282, 348 

reverse identity matrix 203 

reverse order 67, 87 


right angle 9, 15 
right-inverse 66 
RINEX 481, 486, 606 
R” 101 

robust 277 

root mean square 606 
rotation 460, 477, 492 
rotation matrix 185, 214 
roundoff error 378 
row exchange 50, 90 
row picture 28, 31, 39 
row space 137, 147 
row vector 7 
row-version 580 

RTS recursion 570 


STIAS = A 222 

saddle 248 

Sagnac Effect 451 

satellite constellation 606 

scalar 4, 102 

Schur identity 511 

Schwarz inequality 16, 18, 20 

Selective Availability 447, 452, 606 

semidefinite matrix 240, 241 

sequential least squares 482 

Sherman—Morrison—Woodbury—Schur 561 

shift register 450 

sigma notation 47 

sign reversal 199 

similar matrix 259, 260 

similarity transformation 414, 421, 431- 
433 

sinc function 522 

single difference 535, 538 

singular 40, 275 

singular matrix 201 

singular normal equations 406 

singular value 376 

singular value decomposition see SVD 

skew-symmetric matrix 98, 204, 214 

smoother 571 

smoothing 551, 569, 572 

soft postulation 409 
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solvable 104, 123, 126 

span 136, 137, 143 

special solution 112, 148 
spectral radius 249 

spectral theorem 233 

speed of light 476 

spherical error probable 606 
spread spectrum 449, 606 
spreadsheet 12 

Square root matrix 247, 569 
square-root filter 567 

squaring channel 606 

stability 250 

standard basis 138, 260 

standard deviation 183, 309, 319, 606 
Standard Positioning Service 606 
state equation 524, 544, 545, 550 
state vector 515, 550 

static receiver 526 

station adjustment 438 

stationary 517, 518 

steady model 511, 573, 576, 577 
subspace 103, 107 

support function 339 

SVD 258, 263, 269, 375, 410 
symmetric matrix 89, 214, 233 
symmetric product 89 


Teaching Codes 93 

Teunissen 420, 482, 495 
three-dimensional model 362 
time average 519 

topocentric 477 

total least squares 442 

trace 217, 219, 231, 414, 438, 564 
transparent proof 95 

transpose 87 

transpose of AB 87 

tree 291, 305 

triangle inequality 19 

triangular matrix 201 

tridiagonal matrix 70, 80, 85, 546 
trigonometric leveling 360, 362 
trilateration 448 
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triple difference 459, 467, 607 
troposphere 456, 457, 493 
tropospheric delay 456 

true anomaly 483, 485 


unbiased 323, 327 

Uncertainty Principle 225, 232, 522 
unique solution 127, 131 

unit circle 14 

unit vector 13, 14 

Universal Time Coordinated 451, 607 
update 543, 549, 559, 561 

upper triangular 37 

User Equivalent Range Error 607 


variance 183, 319 

variance of unit weight 279, 301, 325, 
333, 335, 359, 441 

variogram 532 

VDOP 454 

vector addition 4, 5, 102 

vector space 101, 106 

Voltage Law 276, 293 

volume 199 


wall 158 

water vapor 456 

wavelets 174 

weight matrix 323, 466 

modified 398 

weight normalization 358 

weighted least squares 279, 336 

weighted normal equation 281 

weights 391, 403, 550 

WGS 84 449, 468, 472, 476, 479, 486, 
607 

white noise 324, 518, 522 

Wide Area Augmentation System 607 

wide lane 512 

Wiener process 518 


Y code 450, 607 


z-transform 539, 540 
Z-count 607 

zenith distance 363, 365, 607 
zero matrix 55 
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